I know that the in apparent concurrency , multiple threads are sharing the CPU and taking turns for executing while in true concurrency multiple tasks are simultaneously executing on different CPUs at a time.
Am I correct?
User level thread have Apparent concurrency : If any thread makes a blocking system call then entire process is blocked i.e all the threads within the process is blocked this happens because the operating system does not understand that there are multiple threads as they are implemented by library
Kernel level thread true concurrency: can recognize that there are multiple threads so if one thread blocks then there are other threads that get picked up and can perform concurrently.
You're looking for the difference between Concurrency and Parallelism.
The former is multiple processes/threads sharing a single core. This is what you refer to as "apparent concurrency". Parallelism actually has multiple instructions running at the same time.
Related
I’m learning about multi threading and I understand the difference between parallelism and concurrency. My question is, if I create a second thread and detach it from the parent thread, do these 2 run concurrently or in parallel?
Typically (for a modern OS) there's about 100 processes with maybe an average of 2 threads each for a total of 200 threads just for background services, GUI, software updaters, etc. You start your process (1 more thread) and it spawns its second thread, so now there's 202 threads.
Out of those 202 threads almost all of them will be blocked waiting for something to happen most of the time. If 30 threads are not blocked, and you have 8 CPUs, then 30 threads compete for those 8 CPUs.
If 30 threads compete for 8 CPUs, then maybe 4 threads are high priority and get a whole CPU for themselves and 10 threads are low priority and don't get any CPU time because there's more important work for CPUs to do; and maybe 12 threads are medium priority and share 4 CPUs by time multiplexing (frequently switching between threads). How this actually works depends on the OS and its scheduler (its very different for different operating systems).
Of course the number of threads and their priorities changes often (e.g. as threads are created and terminated), and threads block and unblock very often, so how many threads are competing for CPUs (not blocked) is constantly changing. Maybe there's 30 threads competing for 8 CPUs at one point in time, and 2 milliseconds later there's 5 threads and 3 CPUs are idle.
My question is, if I create a second thread and detach it from the parent thread, do these 2 run concurrently or in parallel?
Yes; your 2 threads may either run concurrently (share a CPU with time multiplexing) or run in parallel (on different CPUs); and can do both at different times (concurrently for a while, then parallel for a while, then..).
if I create a second thread and detach it...
Detach is not something your program can do to a thread. It's merely something the program can do to a std::thread object. The object is not the thread. The object is just a handle that your program can use to talk about the thread, and "detach" just grants permission for the program to destroy the handle, while the thread continues to run.
...from the parent thread
So, "detach" doesn't detach one thread from another (e.g., a "child" from its "parent,") It detaches a thread from its std::thread handle.
FYI: Most programming environments do not recognize any parent/child relationship between threads. If thread A creates thread B, that does not give thread A any special privileges or capabilities with respect to thread B that threads C, D, and E don't also have.
That's not to say that you can't recognize the relationship. It might mean something in your program. It just doesn't mean anything to the OS.
...Parallel or concurrently
That's not an either-or choice. Parallel is concurrent.
"Concurrency" does not mean "threads context switching." Rather, it's a statement about the order in which the threads access and update shared variables.
When two threads run concurrently on typical multiprocessing hardware, their actions will be serializable. That means that the outcome of the program will be the same as if some single, imaginary thread did all the same things that the real program threads did, and it did them one-by-one, in some particular order.
Two threads are concurrent with each other if the serialization is non-deterministic. That is to say, if the "particular order" in which the things all seem to happen is not entirely determined by the program. They are concurrent if different runs of the program can behave as if the imaginary single thread chose different serializations.
There's also a simpler test, which usually amounts to the same thing: Two threads probably are concurrent with each other if both threads are started before either one of them finishes.
Any way you look at it though, if two threads are truly running in parallel with each other, then they must also be concurrent with each other.
Do they run in parallel?
#Brendan already answered that one. TLDR: if the computer has more than one CPU, then they potentially can run in parallel. How much of the time they actually spend running in parallel depends on many things though, and many of those things are in the domain of the operating system.
I have a main thread which do some not-so-heavy-heavy work and also I'm creating worker threads which do very-heavy work. All documentation and examples shows how to create a number of hardware threads equal to std::thread::hardware_concurrency(). But since main thread already existed the number of threads becomes std::thread::hardware_concurrency() + 1. For example:
my machine supports 2 hardware threads.
in main thread I'm creating this 2 threads and the total number of threads becomes 3.
a core with the main thread do it's job plus (probably) the worker job.
Of course I don't want this because UI (which is done in main thread) becomes not responsive due to latency. What will happen if I create std::thread::hardware_concurrency() - 1 thread? Will it guarantee that the main thread and only main thread is running on single core? How can I check it?
P.S.: I'm using some sort of pool - I start threads on the program start and stop on exit. During the execution all worker threads run infinite while loop.
As others have written in the comments, you should carefully consider whether you can do a better job than the OS.
That being said, it is technically possible:
Use the native_handle method to get the OS's handle to your thread.
Consult your OS's documentation for setting the thread affinity. E.g., using pthreads, you'd want pthread_set_affinity.
This gives you full control over where each thread runs. In particular, you can give one of the threads a core of its own.
Note that this isn't part of the standard, as it is a level that is not portable. This might serve as another hint that it's possibly not what you're looking for.
No - std::thread::hardware_concurrency() only gives you a hint about the potential numbers of cores in use for multithreading. You might be interested in CPU Affinity Masks (Putting Threads on different CPUs). This works on the pthread level which you can reached via std::thread::native_handle (http://en.cppreference.com/w/cpp/thread/thread/native_handle)
Depending on your OS, you can get the thread's native handle, and control their priority levels using pthread_setschedparam(), for example giving the worker threads a lower priority than the main thread. This can be one solution to the UI problem. In general, number of threads need not match number of available HW cores.
There are definitely cases where you want to be able to gain full control, and reliably analyze what is going on. You are using Windows, but as an example, it is possible on a multicore machine to exclude e.g. one core from the normal Linux OS scheduler, and use that core for time-critical hard real-time tasks. In essence, you will own that core and handle interrupts for it, thereby enabling something close to hard real-time response times and predictability. Requires careful programming and analysis, and takes a significant effort. But very attractive if done right.
I am little bit confused in multithreading. Actually we create multiple threads for breaking the main process to subprocess for achieving responsiveness and for removing waiting time.
But Here I got a situation where I have to execute the same task using multiple threads parallel.
And My processor can execute 4 threads parallel and so Will it improve the performance if I create more that 4 threads(10 or more). When I put this question to my colleague he is telling that nothing will happen we are already executing many threads in many other applications like browser threads, kernel threads, etc so he is telling to create multiple threads for the same task.
But if I create more than 4 threads that will execute parallel will not create more context switch and decrease the performance.
Or even though we create multiple thread for executing parallely the will execute one after the other so the performance will be the same.
So what to do in the above situations and are these correct?
edit
1 thread worked. time to process 120 seconds.
2 threads worked. time to process is about 60 seconds.
3 threads created. time to process is about 60 seconds.(not change to the time of 2 threads.)
Is it because, my hardware can only create 2 threads(for being dual)?
software thread=piece of code
Hardware thread=core(processor) for running software thread.
So my CPU support only 2 concurrent threads so if I purchase a AMD CPU which is having 8 cores or 12 cores can I achieve higher performance?
Multi-Tasking is pretty complex and performance gains usually depend a lot on the problem itself:
Only a part of the application can be worked in parallel (there is always a first part that splits up the work into multiple tasks). So the first question is: How much of the work can be done in parallel and how much of it needs to be synchronized (in some cases, you can stop here because so little can be done in parallel that the whole work isn't worth it).
Multiple tasks may depend on each other (one task may need the result of another task). These tasks cannot be executed in parallel.
Multiple tasks may work on the same data/resources (read/write situation). Here we need to synchronize access to this data/resources. If all tasks needs write access to the same object during the WHOLE process, then we cannot work in parallel.
Basically this means that without the exact definition of the problem (dependencies between tasks, dependencies on data, amount of parallel tasks, ...) it's very hard to tell how much performance you'll gain by using multiple threads (and if it's really worth it).
http://en.wikipedia.org/wiki/Amdahl%27s_law
Amdahl's states in a nutshell that the performance boost you receive from parallel execution is limited by your code that must run sequentially.
Without knowing your problem space here are some general things you should look at:
Refactor to eliminate mutex/locks. By definition they force code to run sequentially.
Reduce context switch overhead by pinning threads to physical cores. This becomes more complicated when threads must wait for work (ie blocking on IO) but in general you want to keep your core as busy as possible running your program not switching out threads.
Unless you absolutely need to use threads and sync primitives try use a task scheduler or parallel algorithms library to parallelize your work. Examples would be Intel TBB, Thrust or Apple's libDispatch.
I'm writing a multi-threaded application that using pthread. It has multiple threads (more than I have cores) to handle several connections and data processing. I would like to organize the threads so that there are 3 cores (out of 4) dedicated to 3 critical tasks, while all other non-critical tasks are in one core. What I am worried about is if the 3 dedicated cores will be slowed down by the thrashing that takes place on the non-critical core. How can I set this up? At first I thought that if I pthread_create() and in that thread pthread_create() all my non-critical threads this would accomplish my goal but I can't find proof of that anywhere. Thanks!
I use QueueUserWorkItem() function to invoke threadpool.
And I tried lots of work with it. (about 30000)
but by the task manager my application only make 4~5 thread after I push the start button.
I read the MSDN which said that the default number of thread limitation is about 500.
why just a few of threads are made in my application?
I'm tyring to speed up my application and I dout this threadpool is the one of reason that slow down my application.
thanks
It is important to understand how the threadpool scheduler works. It was designed to fine-tune the number of running threads against the capabilities of your machine. Your machine probably can run only two threads at the same time, dual-core CPUs are the current standard. Maybe four.
So when you dump a bunch of threads in its lap, it starts out by activating only two threads. The rest of them are in a queue, waiting for CPU cores to become available. As soon as one of those two threads completes, it activates another one. Twice a second, it evaluates what's going on with active threads that didn't complete. It makes the rough assumption that those threads are blocking and thus not making progress and allows another thread to activate. You've now got three running threads. Getting up the 500 threads, the default max number of threads, will take 249 seconds.
Clearly, this behavior spells out what a thread should do to be suitable to run as a threadpool thread. It should complete quickly and don't block often. Note that blocking on I/O requests is dealt with separately.
If this behavior doesn't suit you then you can use a regular Thread. It will start running right away and compete with other threads in your program (and the operating system) for CPU time. Creating 30,000 of such threads is not possible, there isn't enough virtual memory available for that. A 32-bit operating system poops out somewhere south of 2000 threads, consuming all available virtual memory. You can get about 50,000 threads on a 64-bit operating system before the paging file runs out. Testing these limits in a production program is not recommended.
I think you may have misunderstood the use of the threadpool. Spawning threads and killing threads involves the Windows Kernel and is an expensive operation. If you continuously need threads to perform an aynchronous operation and then you throw them away it would perform many system calls.
So the threadpool is actually a group of threads which are created once which instead of exiting when they complete their task actually enter a wait for another item for queueuserworkitem. The threadpool will then tune itself based on how many threads are required concurrently for your process. If you wish to test this write this code:
for(int i = 0; i < 30000; i++)
{
ThreadPool.QueueUserWorkItem(myMethod);
}
You will see this will create a whole bunch of threads. Maybe not 30000 as some of the threads that are created will be reused as the ThreadPool starts to work through your function calls.
The threadpool is there so you can avoid creating a thread for every asynchronous operation for the very reason that threads are expensive. If you want 30,000 threads you're going to use a lot of memory for the thread stacks plus waste a lot of CPU time doing context switches. Now creating that many threads would be justified if you had 30,000 CPU cores...