According to this SO Question: "Why does Windows 10 start extra threads in my program?" and Hans Passant answer: Windows 10 start a thread pool for each and every C++ process on Windows 10 (at least when compiled in c++ in VS2013+).
According to Microsoft documentation "Thread Pools" and "Thread Pool API (2018-05-30)", I cannot find the way to join the default process thread pool.
Can I join the default process thread pool and how? ... or do I have to create a new one?
This is a list of few drawbacks I could see of having more than one thread pool per process:
More threads created that could have been avoided
More sleeping threads
More memory taken for additional threads and the manager itself
Less efficent thread management algorithm due to possible concurrency between thread pool.
If I have to create new thread pool instead of joining/using one global thread pool per process, does it remove the advantage of having one and only one thread pool per process? Why can't we verify if there is an already created thread pool and use it directly? Why not just being able to join the main process thead pool? Wouldn't be better to have only one thread pool like in C#?
Side note: I'm working on a math algorithm that calculate long enough to be multi-threaded. Also, it is part of a libary into a 3rd party DLL. Being able to join an already created thread pool would have appears more logical to me than created a new one and perhaps interfere with the customer main process threads and another potential thread pool.
After I got a good answer and great informations from Raymon Chen, I discovered this article that I like to share because it helped me a lot to better understand: Top 20 C++ multithreading mistakes and how to avoid them
std::async uses the default thread pool on Windows, so you might like to use that.
More details here and here.
Related
What happens if I posed a work to thread pool and the threads are already working in boost?
I have seen this topic (How to create a thread pool using boost in C++?) which explains how to create a pool.
I also found in this topic (https://stackoverflow.com/a/12267138/7108553)
that if I assigned a job to a pool and all the threads are working
then the job is discarded.
My question is, is this the case if I created a pool in a similar way to this (How to create a thread pool using boost in C++?)? my understanding is that if I assigned a job to a pool and all the threads are already working then this will be handled by the library internally and once a thread finishes then the job is assigned to it .. is this correct?
and if not, is there an efficient way to keep track of free and occupied threads?
In both links the answers use ASIO, which post to the io_service which will enqueue the work until there's a free thread to run it.
If work increases beyond the number of available threads you can associate more threads with the IO service. Obviously there's always a point of diminishing returns.
I'm using Google Guava and Netflix Hystrix libraries in our project. Each library comes with it is own pool of threads that you can configure.
Thats made think of the impact of that. I mean each library is maintaining its own pool of threads and of course, each hardware has its own optimal setup.
Lets say that I setup Guava for 50 in its thread pool and Hystrix for 40 in its thread pool. What is going to happen? They will compete over resources?
I'm not familiar with Hystrix, but your example for Guava is:
ListeningExecutorService service =
MoreExecutors.listeningDecorator(Executors.newFixedThreadPool(10));
In this case, Guava isn't providing a thread pool at all... you're configuring a thread pool in your code using the standard JDK methods for creating a thread pool, then wrapping that ExecutorService using a Guava method.
To (hopefully) answer your question, if you have two separate thread pools, they are in fact separate... they won't share any threads. If one has 50 threads and one has 40, you'll have 90 total threads. Again, I don't know how whatever you're doing with Hystrix works, but if it's similar to what you're doing with Guava (creating a thread pool using Executors.newFixedThreadPool(n) and passing it to something else), it's possible to just create one thread pool and have both libraries use it, in which case they will share threads.
I'm looking for an API on windows that enables to to create and kill threads at will. Also having ability to bind threads to cores. I was introduced to Win32 Threading API here.
However when I checked MSDN I see _beginthreadex(), and _endthreadex(). So I'm guessing there should be a call to _endthreadex everytime I create a thread?
To get answers to such questions I'm looking for a tutorial on Windows Threading. Can anyone help with this?
P.S. This may be off topic, but does Boost support thread affinity too? If so, can someone point me to a tutorial/documentation related to thread affinity?
Having thread created (such as with _beginthreadex) you need to let the thread exit gracefully as you never know if it is in the middle of something just now (having a lock on a certain resource - for instance). Still you have an option to blow it away with TerminateThread API any time.
SetThreadAffinityMask and friends let you locate your threads at the CPU battlefield. You might end up leaving OS scheduler to choose cores to run your threads on though, as chances are high that it is going to be more efficient.
Update on reusing threads: Creating a thread you are passing your thread proc to start, and as soon as you return from it, the thread is about to be terminated. That is, starting another worker thread activity is possible in two ways: either create a new thread from the start, or do not exit from thread proc and synchronize to catch up a new worker activity request. The latter might be implemented using IPC objects, e.g. events:
int ThreadProc()
{
while(true)
{
wait for new event;
if(termination requested) break;
otherwise, on worker activity request, do next requested task;
}
}
Refer to Thread Synchronization for Beginners for sample code and description.
If you are using MFC, you can better use CWinThread. You can send messages to the thread very easily and can control the thread's behaviour from outside. Using the thread's handle, you can provide an affinity mask for a thread using SetThreadAffinityMask, which will schedule a thread on desired processor(s).
1) Do not mix up _beginthread/_beginthreadex and the Win32 API Function CreateThread. These are two different APIs. See Other SO Post for details.
2) If you use _beginthread/_beginthreadex, _endthread/_endthreadex should be used for termination
3) TerminateThread (and also _endthread) should not be used under normal conditions. See MSDN Post.
4) Functions such as SetThreadAffinityMask, or SetThreadIdealProcessor can be used to set the core a thread should use.
5) The boost threading API is much more robust and simple. Actually its the base of the C++11 threads.
I am developing a C++ application that needs to process large amount of data. I am not in position to partition data so that multi-processes can handle each partition independently. I am hoping to get ideas on frameworks/libraries that can manage threads and work allocation among worker threads.
Manage threads should include at least below functionality.
1. Decide on how many workers threads are required. We may need to provide user-defined function to calculate number of threads.
2. Create required number of threads.
3. Kill/stop unnecessary threads to reduce resource wastage.
4. Monitor healthiness of each worker thread.
Work allocation should include below functionality.
1. Using callback functionality, the library should get a piece of work.
2. Allocate the work to available worker thread.
3. Master/slave configuration or pipeline-of-worker-threads should be possible.
Many thanks in advance.
Your question essentially boils down to "how do I implement a thread pool?"
Writing a good thread pool is tricky. I recommend hunting for a library that already does what you want rather than trying to implement it yourself. Boost has a thread-pool library in the review queue, and both Microsoft's concurrency runtime and Intel's Threading Building Blocks contain thread pools.
With regard to your specific questions, most platforms provide a function to obtain the number of processors. In C++0x this is std::thread::hardware_concurrency(). You can then use this in combination with information about the work to be done to pick a number of worker threads.
Since creating threads is actually quite time consuming on many platforms, and blocked threads do not consume significant resources beyond their stack space and thread info block, I would recommend that you just block worker threads with no work to do on a condition variable or similar synchronization primitive rather than killing them in the first instance. However, if you end up with a large number of idle threads, it may be a signal that your pool has too many threads, and you could reduce the number of waiting threads.
Monitoring the "healthiness" of each thread is tricky, and typically platform dependent. The simplest way is just to check that (a) the thread is still running, and hasn't unexpectedly died, and (b) the thread is processing tasks at an acceptable rate.
The simplest means of allocating work to threads is just to use a single shared job queue: all tasks are added to the queue, and each thread takes a task when it has completed the previous task. A more complex alternative is to have a queue per thread, with a work-stealing scheme that allows a thread to take work from others if it has run out of tasks.
If your threads can submit tasks to the work queue and wait for the results then you need to have a scheme for ensuring that your worker threads do not all get stalled waiting for tasks that have not yet been scheduled. One option is to spawn a new thread when a task gets blocked, and another is to run the not-yet-scheduled task that is blocking a given thread on that thread directly in a recursive manner. There are advantages and disadvantages with both these schemes, and with other alternatives.
I have a performance issue where clients are creating hundreds of a particular kind of object "Foo" in my C++ application's DOM. Each Foo instance has its own asynchronous work queue with its own thread. Obviously, that doesn't scale.
I need to share threads amongst work queues, and I don't want to re-invent the wheel. I need to support XP, so I can't use the Vista/Win7 thread pool. The work that needs to be done to process each queue item involves making COM calls in the multi-threaded COM apartment. The documentation for the XP thread pool says that it is okay to call CoInitializeEx() with the MTA apartment in the thread worker function callback. I've written a test app and verified that this works. I made the app run 1 million iterations with and without a CoInitializeEx/CoUninitialize pair in the WorkItem callback function. It takes 35 seconds with the CoInit* calls and 5 seconds without them. That's way too much overhead for my application. Since the thread pool is per-process and 3rd-party code runs in my process, I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().
Given all of that, is there any way that I can use the Win32 thread pool? Am I missing something, or is the XP thread pool pretty useless for high-performance COM applications? Am I just going to have to create my own thread-sharing system?
Have you verified what is taking so long? i.e. is it the call to CoInitializeEx()? You definitely don't need to call CoInitialize once per task. You also don't say how many threads you spawn, i.e. if your running on a dual core and your work is CPU intensive don't expect more than a 2x speedup, and if your work isn't CPU intensive then it's waiting on some resource (memory, disk, net) and speedups will be similarly constrained, perhaps made worse if there is a lock being held for that resource.
If you can use Visual Studio 2010 take a look at the Parallel Pattern Library and Asynchronous Agents Library, there are a couple tools that can help make this take less code to write.
If you can't you can at least try placing a token in TLS that represents whether COM has been initialized on that thread and use the presence of this token to bypass your calls to CoInitialize when they aren't needed.
I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().
Windows will clean up if a thread exits without calling CoUninitialize, we know this works because if it didn't there would be no cleanup when threads crash or are aborted.
So the only way this hack could cause a problem is of someone was trying to queue work items that needed an STA apartment, which seem unlikely.
I'd be tempted to go for it.