I/O Completion Port vs. QueueUserApc? - c++

Under Windows, there are two means to insert work items for avoiding to create too many threads:
Means 1: Use IOCP;
Means 2: Use QueueUserApc.
However, means 1 is far more intricate than means 2.
So my question is: what are the advantages of means 1 relative to that of means 2?

When you call QueueUserApc, you must target a specific thread.
IOCP has a built-in thread dispatch mechanism that QueueUserApc lacks that allows you to target the most efficient thread out of a pool of threads. The thread dispatch mechanism automatically prevents too many threads from running at the same time (which causes extra context switches and extra contention) or too few threads from running at the same time (which causes poor performance).
Windows actually keeps track of the number of threads running IOCP jobs. It initially sets the number of threads it allows to run equal to the number of virtual cores on the machine. However, if a thread blocks for I/O or synchronization, another thread blocked on the IOCP port is automatically released, avoiding thread starvation.
In addition, IOCP can be easily hooked up to I/O so that I/O events trigger dispatches of threads blocked on the IOCP port. This is the most efficient way to do I/O to a large number of destinations on Windows.

Related

Non-Overlapped Serial - Do ReadFile() calls from separate threads block each other?

I've inherited a large code base that contains multiple serial interface classes to various hardware components. Each of these serial models use non-overlapped serial for their communication. I have an issue where I get random CPU spikes to 100% which causes the threads to stall briefly, and then the CPU goes back to normal usage after ~10-20 seconds.
My theory is that due to the blocking nature of non-overlapped serial that there are times when multiple threads are calling readFile() and blocking each other.
My question is if multiple threads are calling readFile() (or writeFile()) at the same time will they block each other? Based on my research I believe that's true but would like confirmation.
The platform is Windows XP running C++03 so I don't have many modern tools available
"if multiple threads are calling readFile() (or writeFile()) at the same time will they block each other?"
As far as I'm concerned, they will block each other.
I suggest you could refer to the Doc:Synchronization and Overlapped Input and Output
When a function is executed synchronously, it does not return until
the operation has been completed. This means that the execution of the
calling thread can be blocked for an indefinite period while it waits
for a time-consuming operation to finish. Functions called for
overlapped operation can return immediately, even though the operation
has not been completed. This enables a time-consuming I/O operation to
be executed in the background while the calling thread is free to
perform other tasks.
Using the same event on multiple threads can lead to a race condition
in which the event is signaled correctly for the thread whose
operation completes first and prematurely for other threads using that
event.
And the operating system is in charge of the CPU. Your code only gets to run when the operating system calls it. The OS will not bother running threads that are blocked.Blocking will not occupy the CPU. I suggest you could try to use Windows Performance Toolkik to check cpu utilization.

Is there a way to find out, whether a thread is blocked?

I'm writing a thread pool class in C++ which receives tasks to be executed in parallel. I want all cores to be busy, if possible, but sometimes some threads are idle because they are blocked for a time for synchronization purposes. When this happens I would like to start a new thread, so that there are always approximately as many threads awake as there are cpu cores. For this purpose I need a way to find out whether a certain thread is awake or sleeping (blocked). How can I find this out?
I'd prefer to use the C++11 standard library or boost for portability purposes. But if necessary I would also use WinAPI. I'm using Visual Studio 2012 on Windows 7. But really, I'd like to have a portable way of doing this.
Preferably this thread-pool should be able to master cases like
MyThreadPool pool;
for ( int i = 0; i < 100; ++i )
pool.addTask( &block_until_this_function_has_been_called_a_hundred_times );
pool.join(); // waits until all tasks have been dispatched.
where the function block_until_this_function_has_been_called_a_hundred_times() blocks until 100 threads have called it. At this time all threads should continue running. One requirement for the thread-pool is that it should not deadlock because of a too low number of threads in the pool.
Add a facility to your thread pool for a thread to say "I'm blocked" and then "I'm no longer blocked". Before every significant blocking action (see below for what I mean by that) signal "I'm blocked", and then "I'm no longer blocked" afterwards.
What constitutes a "significant blocking action"? Certainly not a simple mutex lock: mutexes should only be held for a short period of time, so blocking on a mutex is not a big deal. I mean things like:
Waiting for I/O to complete
Waiting for another pool task to complete
Waiting for data on a shared queue
and other similar events.
Use Boost Asio. It has its own thread pool management and scheduling framework. The basic idea is to push tasks to the io_service object using the post() method, and call run() from as many threads as many CPU cores you have. You should create a work object while the calculation is running to avoid the threads from exiting if they don't have enough jobs.
The important thing about Asio is never to use any blocking calls. For I/O calls, use the asynchronous calls of Asio's own I/O objects. For synchronization, use strand objects instead of mutexes. If you post functions to the io service that is wrapped in a strand, then it ensures that at any time at most one task runs that belongs to a certain strand. If there is a conflict, the task remains in Asio's event queue instead of blocking a working thread.
There is one drawback of using asynchronous programming though. It is much harder to read a code that is scattered into several asynchronous calls than one with a clear control flow. You should be aware of this when designing your program.

Setting thread priorities from the running process

I've just come across the Get/SetThreadPriority methods and they got me wondering - can a thread priority meaningfully be set higher than the owning process priority (which I don't believe can be changed programatically in the same way) ?
Are there any pitfalls to using these APIs?
Yes, you can set the thread priority to any class, including a class higher than the one of the current process. In fact, these two values are complementary and provide the base priority of the thread. You can read about it in the Remarks section of the link you posted.
You can set the process priority using SetPriorityClass.
Now that we got the technicalities out of the way, I find little use for manipulating the priority of a thread directly. The OS scheduler is sophisticated enough to boost the priority of threads blocked in I/O over threads doing CPU computations (to the point that an I/O thread will preempt a CPU thread when the I/O interrupt arrives). In fact, even I/O threads are differentiated, with keyboard I/O threads getting a priority boost over file I/O threads for example.
On Windows, the thread and process priorities are combined using an algorthm that decides overall scheduling priority:
Windows priorities
Pitfalls? Well:
Raising the priority of a thread is likely to give the greatest overall gain if it is usually blocked on IO but must run ASAP afer being signaled by its driver, eg. Video IO that must process buffers quickly.
Raising the priority of threads is likely to have the greatest overall negative impact if they are CPU-bound and raised to a high priority, so preventing the running of normal-priority threads. If taken to extremes, OS threads and utilities like Task Manger will not run.

Multi-threaded Event Dispatching

I am developing a C++ application that will use Lua scripts for external add-ons. The add-ons are entirely event-driven; handlers are registered with the host application when the script is loaded, and the host calls the handlers as the events occur.
What I want to do is to have each Lua script running in its own thread, to prevent scripts from locking up the host application. My current intention is to spin off a new thread to execute the Lua code, and allow the thread to terminate on its own once the code has completed. What are the potential pitfalls of spinning off a new thread as a form of multi-threaded event dispatching?
Here are a few:
Unless you take some steps to that effect, you are not in control of the lifetime of the threads (they can stay running indefinitely) or the resources they consume (CPU, etc)
Messaging between threads and synchronized access to commonly accessible data will be harder to implement
If you are expecting a large number of add-ons, the overhead of creating a thread for each one might be too great
Generally speaking, giving event-driven APIs a new thread to run on strikes me as a bad decision. Why have threads running when they don't have anything to do until an event is raised? Consider spawning one thread for all add-ons, and managing all event propagation from that thread. It will be massively easier to implement and when the bugs come, you will have a fighting chance.
Creating a new thread and destroying it frequently is not really a good idea. For one, you should have a way to bound this so that it doesn't consume too much memory (think stack space, for example), or get to the point where lots of pre-emption happens because the threads are competing for time on the CPU. Second, you will waste a lot of work associated with creating new threads and tearing them down. (This depends on your operating system. Some OSs might have cheap thread creation and others might have that be expensive.)
It sounds like what you are seeking to implement is a work queue. I couldn't find a good Wikipedia article on this but this comes close: Thread pool pattern.
One could go on for hours talking about how to implement this, and different concurrent queue algorithms that can be used. But the idea is that you create N threads which will drain a queue, and do some work in response to items being enqueued. Typically you'll also want the threads to, say, wait on a semaphore while there are no items in the queue -- the worker threads decrement this semaphore and the enqueuer will increment it. To prevent enqueuers from enqueueing too much while worker threads are busy and hence taking up too much resources, you can also have them wait on a "number of queue slots available" semaphore, which the enqueuer decrements and the worker thread increments. These are just examples, the details are up to you. You'll also want a way to tell the threads to stop waiting for work.
My 2 cents: depending on the number and rate of events generated by the host application, the main problem I can see is in term of performances. Creating and destroyng thread has a cost [performance-wise] I'm assuming that each thread once spawned do not need to share any resource with the other threads, so there is no contention.
If all threads are assigned on a single core of your CPU and there is no load balancing, you can easily overload one CPU and have the others [on a multcore system] unloaded. I'll consider some thread affinity + load balancing policy.
Other problem could be in term of resource [read memory] How much memory each LUA thread will consume?
Be very careful to memory leaks in the LUA threads as well: if events are frequent and threads are created/destroyed frequently leaving leacked memory, you can consume your host memory quite soon ;)

Possible frameworks/ideas for thread managment and work allocation in C++

I am developing a C++ application that needs to process large amount of data. I am not in position to partition data so that multi-processes can handle each partition independently. I am hoping to get ideas on frameworks/libraries that can manage threads and work allocation among worker threads.
Manage threads should include at least below functionality.
1. Decide on how many workers threads are required. We may need to provide user-defined function to calculate number of threads.
2. Create required number of threads.
3. Kill/stop unnecessary threads to reduce resource wastage.
4. Monitor healthiness of each worker thread.
Work allocation should include below functionality.
1. Using callback functionality, the library should get a piece of work.
2. Allocate the work to available worker thread.
3. Master/slave configuration or pipeline-of-worker-threads should be possible.
Many thanks in advance.
Your question essentially boils down to "how do I implement a thread pool?"
Writing a good thread pool is tricky. I recommend hunting for a library that already does what you want rather than trying to implement it yourself. Boost has a thread-pool library in the review queue, and both Microsoft's concurrency runtime and Intel's Threading Building Blocks contain thread pools.
With regard to your specific questions, most platforms provide a function to obtain the number of processors. In C++0x this is std::thread::hardware_concurrency(). You can then use this in combination with information about the work to be done to pick a number of worker threads.
Since creating threads is actually quite time consuming on many platforms, and blocked threads do not consume significant resources beyond their stack space and thread info block, I would recommend that you just block worker threads with no work to do on a condition variable or similar synchronization primitive rather than killing them in the first instance. However, if you end up with a large number of idle threads, it may be a signal that your pool has too many threads, and you could reduce the number of waiting threads.
Monitoring the "healthiness" of each thread is tricky, and typically platform dependent. The simplest way is just to check that (a) the thread is still running, and hasn't unexpectedly died, and (b) the thread is processing tasks at an acceptable rate.
The simplest means of allocating work to threads is just to use a single shared job queue: all tasks are added to the queue, and each thread takes a task when it has completed the previous task. A more complex alternative is to have a queue per thread, with a work-stealing scheme that allows a thread to take work from others if it has run out of tasks.
If your threads can submit tasks to the work queue and wait for the results then you need to have a scheme for ensuring that your worker threads do not all get stalled waiting for tasks that have not yet been scheduled. One option is to spawn a new thread when a task gets blocked, and another is to run the not-yet-scheduled task that is blocking a given thread on that thread directly in a recursive manner. There are advantages and disadvantages with both these schemes, and with other alternatives.