What is the maximum number of threads a process can have in windows - c++

In a windows process is there any limit for the threads to be used at a time. If so what is the maximum number of threads that can be used per process?

There is no limit that I know of, but there are two practical limits:
The virtual space for the stacks. For example in 32-bits the virtual space of the process is 4GB, but only about 2G are available for general use. By default each thread will reserve 1MB of stack space, so the top value are 2000 threads. Naturally you can change the size of the stack and make it lower so more threads will fit in (parameter dwStackSize in CreateThread or option /STACK in the linker command). If you use a 64-bits system this limit practically dissapears.
The scheduler overhead. Once you read the thousands of threads, just scheduling them will eat nearly 100% of your CPU time, so they are mostly useless anyway. This is not a hard limit, just your program will be slower and slower the more threads you create.

The actual limit is determined by the amount of available memory in various ways. There is no limit of "you can't have more than this many" of threads or processes in Windows, but there are limits to how much memory you can use within the system, and when that runs out, you can't create more threads.
See this blog by Mark Russinovich:
http://blogs.technet.com/b/markrussinovich/archive/2009/07/08/3261309.aspx

Related

Maximum available threads in Windows system? C++

What is the code in C++ to get maximum number of avalible threads in system?
C++ does not have the concept of a maximum number of threads.
It does have the concept of a thread failing to be created, by raising std::system_error. This can happen for any number of reasons, including your OS deciding it doesn't want to spawn any more threads - either because you've hit a hard or soft limit on thread count, or because it actually cannot create a thread if it wanted (e.g. your address space is consumed).
The actual limit would need to be queried in an OS-specific way, outside the C++ standard. For example, on Linux one could query /proc/sys/kernel/threads-max and any relevant ulimit and compute a possible limit.
On Windows there is no queryable limit, and you are limited by address space. See for example "Does Windows have a limit of 2000 threads per process?" exploring this limitation.
The reason systems don't make this trivial to query is because it should not matter. You will quickly exhaust your usable cores long before you hit any practical limit in thread count. Don't make so many threads!
std::thread::hardware_concurrency()
Returns the number of hardware thread contexts. If this value is not computable or well-defined, an implementation should return 0.
You can however create many more std::thread objects, but only this many threads will execute in parallel at any time.
For OpenMP (OMP) you also have omp_get_max_threads()
Returns an integer that is equal to or greater than the number of threads that would be available if a parallel region without num_threads were defined at that point in the code.

openMP: Running with all threads in parallel leads to out-of-memory-exceptions

I want to shorten the runtime of an lengthy image processing algorithm, which is applied to multiple images by using parallel processing with openMP.
The algorithm works fine with single or limited number (=2) of threads.
But: The parallel processing with openMP requires lots of memory, leading to out-of-memory-exceptions, when running with the maximum number of possible threads.
To resolve the issue, I replaced the "throwing of exceptions" with a "waiting for free memory" in case of low memory, leading to many (<= all) threads just waiting for free memory...
Is there any solution/tool/approach to dynamically maintain the memory or start threads depending on available memory?
Try compiling your program 64-bit. 32-bit programs can only have up to 2^32 = about 4GB of memory. 64-bit programs can use significantly more (2^64 which is 18 exabytes). It's very easy to hit 4GB of memory these days.
Note that if you are using more RAM than you have available, your OS will have to page some memory to disk. This can hurt performance a lot. If you get to this point (where you are using a significant portion of RAM) and still have extra cores, you would have to go deeper into the algorithm to find a more granular section to parallelize.
If you for some reason can't switch to 64-bit, you can do multiprocessing (running multiple instances of a program) so each process will have up to 4GB. You will need to launch and coordinate the processes somehow. Depending on your needs, this could mean using simple command-line arguments or complicated inter-process communication (IPC). OpenMP doesn't do IPC, but Open MPI does. Open MPI is generally used for communication between many nodes on a network, but it can be set up to run concurrent instances on one machine.

How many threads can a C++ application create

I'd like to know, how many threads can a C++ application create at most.
Does OS, hardware caps and other factors influence on these bounds?
[C++11: 1.10/1]: [..] Under a hosted implementation, a C++ program can have more than one thread running concurrently. [..] Under a freestanding implementation, it is implementation-defined whether a program can have more than one thread of execution.
[C++11: 30.3/1]: 30.3 describes components that can be used to create and manage threads. [ Note: These threads are intended to map one-to-one with operating system threads. —end note ]
So, basically, it's totally up to the implementation & OS; C++ doesn't care!
It doesn't even list a recommendation in Annex B "Implementation quantities"! (which seems like an omission, actually).
C++ as language does not specify a maximum (or even a minimum beyond the one). The particular implementation can, but I never saw it done directly. The OS also can, but normally just states a lank like limited by system resources. Each thread uses up some nonpaged memory, selector tables, other bound things, so you may run out of that. If you don't the system will become pretty unresponsive if the threads actually do work.
Looking from other side, real parallelism is limited by actual cores in the system, and you shall not have too many threads. Applications that could logically spawn hundreds or thousands usually start using thread pools for good practical reasons.
Basically, there are no limits at your C++ application level. The number of maximum thread is more on the OS level (based on your architecture and memory available).
On Linux, there are no limit on the maximum number of thread per process. The number of thread is limited system wide. You can check the number of maximum allowed threads by doing:
cat /proc/sys/kernel/threads-max
On Windows you can use the testlimit tool to check the maximum number of thread:
http://blogs.technet.com/b/markrussinovich/archive/2009/07/08/3261309.aspx
On Mac OS, please read this table to find the number of thread based on your hardware configuration
However, please keep in mind that you are on a multitasking system. The number of threads executed at the same time is limited by the total number of processor cores available. To do more things, the system tries to switch between all theses thread. Each "switch" has a performce (a few milliseconds). If your system is "switching" too much, it won't speed too much time to "work" and your overall system will be slow.
Generally, the limit of number of threads is the amount of memory available, but there have been systems around that have lower limits.
Unless you go mad with creating threads, it's very unlikely it will be a problem to have a limit. Creating more threads is rarely beneficial, once you reach a certain number - that number may be around the same as, or a few times higher than, the number of cores (which for real big, heavy hardware can be a few hundred these days, with 16-core processors and 8 sockets).
Threads that are CPU bound should not be more than the number of processors - nothing good comes from that.
Threads that are doing I/O or otherwise "sitting around waiting" can be higher in numbers - 2-5 per processor core seems reasonable. Given that modern machines have 8 sockets and 16 cores at the higher end of the spectrum, that's still only around 1000 threads.
Sure, it's possible to design, say, a webserver system where each connection is a thread, and the system has 10k or 20k connections active at any given time. But it's probably not the most efficient.
I'd like to know, how many threads can a C++ application create at most.
Implementation/OS-dependent.
Keep in mind that there were no threads in C++ prior to C++11.
Does OS, hardware caps and other factors influence on these bounds?
Yes.
OS might be able limit number of threads a process can create.
OS can limit total number of threads running simultaneously (to prevent fork bombs, etc, linux can definitely do that).
Available physical(and virtual) memory will limit number of threads you can create IF each thread allocates its own stack.
There can be a (possibly hardcoded) limit on how many thread "handles" OS can provide.
Underlying OS/platform might not have threads at all (real-mode compiler for DOS/FreeDOS or something similar).
Apart from the general impracticality of having many more threads than cores, yes, there are limits. For example, a system may keep a unique "process ID" for each thread, and there may be only 65535 of them available. Also, each thread will have its own stack, and those stacks will eventually consume too much memory (you can however adjust the size of each stack when you spawn threads).
Here's an informative article--ignore the fact that it mentions Windows, as the concepts are similar on other common systems: http://blogs.msdn.com/b/oldnewthing/archive/2005/07/29/444912.aspx
There is nothing in the C++ standard that limits number of threads. However, OS will certainly have a hard limit.
Having too many threads decreases the throughput of your application, so it's recommended that you use a thread pool.

Thread limit in Unix before affecting performance

I have some questions regarding threads:
What is the maximum number of threads allowed for a process before it decreases the performance of the application?
If there's a limit, how can this be changed?
Is there an ideal number of threads that should be running in a multi-threaded application? If it depends on what the application is doing, can you cite an example?
What are the factors to consider that affects these performance/thread limit?
This is actually a hard set of questions to which there are no absolute answers, but the following should serve as decent approximations:
It is a function of your application behavior and your runtime environment, and can only be deduced by experimentation. There is usually a threshold after which your performance actually degrades as you increase the number of threads.
Usually, after you find your limits, you have to figure out how to redesign your application such that the cost-per-thread is not as high. (Note that for some domains, you can get better performance by redesigning your algorithm and reducing the number of threads.)
There is no general "ideal" number of threads, but you can sometimes find the optimal number of threads for an application on a specific runtime environment. This is usually done by experimentation, and graphing the results of benchmarks while varying the following:
Number of threads.
Buffer sizes (if the data is not in RAM) incrementing at some reasonable value (e.g., block size, packet size, cache size, etc.)
Varying chunk sizes (if you can process the data incrementally).
Various tuning knobs for the OS or language runtime.
Pinning threads to CPUs to improve locality.
There are many factors that affect thread limits, but the most common ones are:
Per-thread memory usage (the more memory each thread uses, the fewer threads you can spawn)
Context-switching cost (the more threads you use, the more CPU-time is spent switching).
Lock contention (if you rely on a lot of coarse grained locking, the increasing the number of threads simply increases the contention.)
The threading model of the OS (How does it manage the threads? What are the per-thread costs?)
The threading model of the language runtime. (Coroutines, green-threads, OS threads, sparks, etc.)
The hardware. (How many CPUs/cores? Is it hyperthreaded? Does the OS loadbalance the threads appropriately, etc.)
Etc. (there are many more, but the above are the most important ones.)
The answer to your questions 1, 3, and 4 is "it's application dependent". Depending on what your threads do, you may need a different number to maximize your application's efficiency.
As to question 2, there's almost certainly a limit, and it's not necessarily something you can change easily. The number of concurrent threads might be limited per-user, or there might be a maximum number of a allowed threads in the kernel.
There's nothing fixed: it depends what they are doing. Sometimes adding more threads to do asynchronous I/O can increase the performance of another thread with no bad side effects.
This is likely fixed at compile time.
No, it's a process architecture decision. But having at least one listener-scheduler thread besides the one or more threads doing the heavy lifting suggests the number should normally be at least two.
Almost certainly, your ability to really grasp what is going on. Threaded code chokes easily and in the most unexpected ways: making sure the code has no races/deadlocks is hard. Study different ways of handling concurrency, such as shared-nothing (cf. Erlang).
As long as you never have more threads using CPU time than you have cores, you will have optimal performance, but then as soon as you have to wait for I/O There will be unused CPU cycles, so you may want to profile you applications, and see wait portion of the time it spends maxing out the CPU and what portion waiting for RAM, Hard Disk, Network, and other IO, in general if you are waiting for I/O you could have 1 more thread (Provided that you are primarily CPU bound).
For the hard and absolute limit Check out PTHREAD_THREADS_MAX in limits.h this may be what you are looking for. Might be POSIX_THREAD_MAX on some systems.
Any app with more busy threads than the number of processors will cause some overall slowdown. There's an upper limit, but it varies system to system. For some, it used to be 256 and you could recompile the OS to get it a bit higher.
As long as the threads are designed to do separate tasks, then there is not so much issue. However, the problem starts when these threads intersect the resources when locking mechanism should be implemented.

How much memory does a thread consume when first created?

I understand that creating too many threads in an application isn't being what you might call a "good neighbour" to other running processes, since cpu and memory resources are consumed even if these threads are in an efficient sleeping state.
What I'm interested in is this: How much memory (win32 platform) is being consumed by a sleeping thread?
Theoretically, I'd assume somewhere in the region of 1mb (since this is the default stack size), but I'm pretty sure it's less than this, but I'm not sure why.
Any help on this will be appreciated.
(The reason I'm asking is that I'm considering introducing a thread-pool, and I'd like to understand how much memory I can save by creating a pool of 5 threads, compared to 20 manually created threads)
I have a server application which is heavy in thread usage, it uses a configurable thread pool which is set up by the customer, and in at least one site it has 1000+ threads, and when started up it uses only 50 MB. The reason is that Windows reserves 1MB for the stack (it maps its address space), but it is not necessarily allocated in the physical memory, only a smaller part of it. If the stack grows more than that a page fault is generated and more physical memory is allocated. I don't know what the initial allocation is, but I would assume it's equal to the page granularity of the system (usually 64 KB). Of course, the thread would also use a little more memory for other things when created (TLS, TSS, etc), but my guess for the total would be about 200 KB. And bear in mind that any memory that is not frequently used would be unloaded by the virtual memory manager.
Adding to Fabios comments:
Memory is your second concern, not your first. The purpose of a threadpool is usually to constrain the context switching overhead between threads that want to run concurrently, ideally to the number of CPU cores available.
A context switch is very expensive, often quoted at a few thousand to 10,000+ CPU cycles.
A little test on WinXP (32 bit) clocks in at about 15k private bytes per thread (999 threads created). This is the initial commited stack size, plus any other data managed by the OS.
If you're using Vista or Win2k8 just use the native Win32 threadpool API. Let it figure out the sizing. I'd also consider partitioning types of workloads e.g. CPU intensive vs. Disk I/O into different pools.
MSDN Threadpool API docs
http://msdn.microsoft.com/en-us/library/ms686766(VS.85).aspx
I think you'd have a hard time detecting any impact of making this kind of a change to working code - 20 threads down to 5. And then add on the added complexity (and overhead) of managing the thread pool. Maybe worth considering on an embedded system, but Win32?
And you can set the stack size to whatever you want.
This depends highly on the system:
But usually, each processes is independent. Usually the system scheduler makes sure that each processes gets equal access to the available processor. Thus a multi threaded application time is multiplexed between the available threads.
Memory allocated to a thread will affect the memory available to the processes but not the memory available to other processes. A good OS will page out unused stack space so it is not in physical memory. Though if your threads allocate enough memory while live you could cause thrashing as each processor's memory is paged to/from secondary device.
I doubt a sleeping thread has any (very little) impact on the system.
It is not using any CPU
Any memory it is using can be paged out to a secondary device.
I guess this can be measured quite easily.
Get the amount of resources used by the system before creating a thread
Create a thread with default system values (default heap size and others)
Get the amount of resources after creating a thread and make the difference (with step 1).
Note that some threads need to be specified different values than the default ones.
You can try and find an average memory use by creating various number of threads (step 2).
The memory allocated by the OS when creating a thread consists of threads local data: TCB TLS ...
From wikipedia: "Threads do not own resources except for a stack, a copy of the registers including the program counter, and thread-local storage (if any)."