Can one core handle all thrashing? C++ pthread

Can one core handle all thrashing? C++ pthread - c++

I'm writing a multi-threaded application that using pthread. It has multiple threads (more than I have cores) to handle several connections and data processing. I would like to organize the threads so that there are 3 cores (out of 4) dedicated to 3 critical tasks, while all other non-critical tasks are in one core. What I am worried about is if the 3 dedicated cores will be slowed down by the thrashing that takes place on the non-critical core. How can I set this up? At first I thought that if I pthread_create() and in that thread pthread_create() all my non-critical threads this would accomplish my goal but I can't find proof of that anywhere. Thanks!

Related

When threads are created, do they run in parallel or concurrently?

I’m learning about multi threading and I understand the difference between parallelism and concurrency. My question is, if I create a second thread and detach it from the parent thread, do these 2 run concurrently or in parallel?

Typically (for a modern OS) there's about 100 processes with maybe an average of 2 threads each for a total of 200 threads just for background services, GUI, software updaters, etc. You start your process (1 more thread) and it spawns its second thread, so now there's 202 threads.
Out of those 202 threads almost all of them will be blocked waiting for something to happen most of the time. If 30 threads are not blocked, and you have 8 CPUs, then 30 threads compete for those 8 CPUs.
If 30 threads compete for 8 CPUs, then maybe 4 threads are high priority and get a whole CPU for themselves and 10 threads are low priority and don't get any CPU time because there's more important work for CPUs to do; and maybe 12 threads are medium priority and share 4 CPUs by time multiplexing (frequently switching between threads). How this actually works depends on the OS and its scheduler (its very different for different operating systems).
Of course the number of threads and their priorities changes often (e.g. as threads are created and terminated), and threads block and unblock very often, so how many threads are competing for CPUs (not blocked) is constantly changing. Maybe there's 30 threads competing for 8 CPUs at one point in time, and 2 milliseconds later there's 5 threads and 3 CPUs are idle.
My question is, if I create a second thread and detach it from the parent thread, do these 2 run concurrently or in parallel?
Yes; your 2 threads may either run concurrently (share a CPU with time multiplexing) or run in parallel (on different CPUs); and can do both at different times (concurrently for a while, then parallel for a while, then..).

if I create a second thread and detach it...
Detach is not something your program can do to a thread. It's merely something the program can do to a std::thread object. The object is not the thread. The object is just a handle that your program can use to talk about the thread, and "detach" just grants permission for the program to destroy the handle, while the thread continues to run.
...from the parent thread
So, "detach" doesn't detach one thread from another (e.g., a "child" from its "parent,") It detaches a thread from its std::thread handle.
FYI: Most programming environments do not recognize any parent/child relationship between threads. If thread A creates thread B, that does not give thread A any special privileges or capabilities with respect to thread B that threads C, D, and E don't also have.
That's not to say that you can't recognize the relationship. It might mean something in your program. It just doesn't mean anything to the OS.
...Parallel or concurrently
That's not an either-or choice. Parallel is concurrent.
"Concurrency" does not mean "threads context switching." Rather, it's a statement about the order in which the threads access and update shared variables.
When two threads run concurrently on typical multiprocessing hardware, their actions will be serializable. That means that the outcome of the program will be the same as if some single, imaginary thread did all the same things that the real program threads did, and it did them one-by-one, in some particular order.
Two threads are concurrent with each other if the serialization is non-deterministic. That is to say, if the "particular order" in which the things all seem to happen is not entirely determined by the program. They are concurrent if different runs of the program can behave as if the imaginary single thread chose different serializations.
There's also a simpler test, which usually amounts to the same thing: Two threads probably are concurrent with each other if both threads are started before either one of them finishes.
Any way you look at it though, if two threads are truly running in parallel with each other, then they must also be concurrent with each other.
Do they run in parallel?
#Brendan already answered that one. TLDR: if the computer has more than one CPU, then they potentially can run in parallel. How much of the time they actually spend running in parallel depends on many things though, and many of those things are in the domain of the operating system.

What is the point of being able to spawn dozens of processes efficiently if only very few of them can be executed in parallel?

Erlang is very efficient in spawning new processes, but what is the point, if the CPU can only execute only e.g. 4 of them in parallel?
Therefore the rest should wait for the Erlang-"context switch".
Do you get more things done faster if you have for example 10k processes, than you would by using Java/C#/C++?

There are many reasons:
Conceptually, processes are easy to reason about. Asynchronous callbacks and promises in languages like JavaScript are harder to reason about because the code in the callbacks can change the values of variables used by other code in the thread.
Processes provide isolation for the code running inside them. A process can only affect other processes by placing messages in their mailboxes. A process cannot meddle with the state of other processes.
Processes are granular. This means:
If you have 400 processes on a 4 core machine the scheduler will make sure to distribute them across the threads in such a way as to fully utilize the 4 cores. One core is always going to be handling OS stuff, so the scheduler would likely end up giving the thread running on that core less work than the other 3 threads. But it adapts, so in any situation the scheduler will do it's best to make sure processes wait as little as possible and threads always have a queue of processes waiting for CPU time.
Moving to better hardware with more cores doesn't require changes to the code or architecture of the application. Moving your Erlang application from a 4 core machine to a 64 core machine will mean your application will run about ~16 times faster without any changes, assuming your application is structured in such a way that it can take advantage of the extra cores (usually this means making sure tasks that could be done in parallel are executed in separate processes).
Processes are very lightweight, so there is very little overhead. In most applications the benefits provided by processes and the scheduler far outweigh the small overhead from running thousands of processes. Commodity hardware can easily handle hundreds of thousands of processes.
So in closing, whether or not processes execute in parallel isn't that important. The other benefits they provide are enough to justify their usage.

What is an exact difference true concurrency and apparent concurrency?

I know that the in apparent concurrency , multiple threads are sharing the CPU and taking turns for executing while in true concurrency multiple tasks are simultaneously executing on different CPUs at a time.
Am I correct?

User level thread have Apparent concurrency : If any thread makes a blocking system call then entire process is blocked i.e all the threads within the process is blocked this happens because the operating system does not understand that there are multiple threads as they are implemented by library
Kernel level thread true concurrency: can recognize that there are multiple threads so if one thread blocks then there are other threads that get picked up and can perform concurrently.

You're looking for the difference between Concurrency and Parallelism.
The former is multiple processes/threads sharing a single core. This is what you refer to as "apparent concurrency". Parallelism actually has multiple instructions running at the same time.

Parallel Thread Execution to achieve performance

I am little bit confused in multithreading. Actually we create multiple threads for breaking the main process to subprocess for achieving responsiveness and for removing waiting time.
But Here I got a situation where I have to execute the same task using multiple threads parallel.
And My processor can execute 4 threads parallel and so Will it improve the performance if I create more that 4 threads(10 or more). When I put this question to my colleague he is telling that nothing will happen we are already executing many threads in many other applications like browser threads, kernel threads, etc so he is telling to create multiple threads for the same task.
But if I create more than 4 threads that will execute parallel will not create more context switch and decrease the performance.
Or even though we create multiple thread for executing parallely the will execute one after the other so the performance will be the same.
So what to do in the above situations and are these correct?
edit
1 thread worked. time to process 120 seconds.
2 threads worked. time to process is about 60 seconds.
3 threads created. time to process is about 60 seconds.(not change to the time of 2 threads.)
Is it because, my hardware can only create 2 threads(for being dual)?
software thread=piece of code
Hardware thread=core(processor) for running software thread.
So my CPU support only 2 concurrent threads so if I purchase a AMD CPU which is having 8 cores or 12 cores can I achieve higher performance?

Multi-Tasking is pretty complex and performance gains usually depend a lot on the problem itself:
Only a part of the application can be worked in parallel (there is always a first part that splits up the work into multiple tasks). So the first question is: How much of the work can be done in parallel and how much of it needs to be synchronized (in some cases, you can stop here because so little can be done in parallel that the whole work isn't worth it).
Multiple tasks may depend on each other (one task may need the result of another task). These tasks cannot be executed in parallel.
Multiple tasks may work on the same data/resources (read/write situation). Here we need to synchronize access to this data/resources. If all tasks needs write access to the same object during the WHOLE process, then we cannot work in parallel.
Basically this means that without the exact definition of the problem (dependencies between tasks, dependencies on data, amount of parallel tasks, ...) it's very hard to tell how much performance you'll gain by using multiple threads (and if it's really worth it).

http://en.wikipedia.org/wiki/Amdahl%27s_law
Amdahl's states in a nutshell that the performance boost you receive from parallel execution is limited by your code that must run sequentially.
Without knowing your problem space here are some general things you should look at:
Refactor to eliminate mutex/locks. By definition they force code to run sequentially.
Reduce context switch overhead by pinning threads to physical cores. This becomes more complicated when threads must wait for work (ie blocking on IO) but in general you want to keep your core as busy as possible running your program not switching out threads.
Unless you absolutely need to use threads and sync primitives try use a task scheduler or parallel algorithms library to parallelize your work. Examples would be Intel TBB, Thrust or Apple's libDispatch.

My threadspool just make 4~5threads. why?

I use QueueUserWorkItem() function to invoke threadpool.
And I tried lots of work with it. (about 30000)
but by the task manager my application only make 4~5 thread after I push the start button.
I read the MSDN which said that the default number of thread limitation is about 500.
why just a few of threads are made in my application?
I'm tyring to speed up my application and I dout this threadpool is the one of reason that slow down my application.
thanks

It is important to understand how the threadpool scheduler works. It was designed to fine-tune the number of running threads against the capabilities of your machine. Your machine probably can run only two threads at the same time, dual-core CPUs are the current standard. Maybe four.
So when you dump a bunch of threads in its lap, it starts out by activating only two threads. The rest of them are in a queue, waiting for CPU cores to become available. As soon as one of those two threads completes, it activates another one. Twice a second, it evaluates what's going on with active threads that didn't complete. It makes the rough assumption that those threads are blocking and thus not making progress and allows another thread to activate. You've now got three running threads. Getting up the 500 threads, the default max number of threads, will take 249 seconds.
Clearly, this behavior spells out what a thread should do to be suitable to run as a threadpool thread. It should complete quickly and don't block often. Note that blocking on I/O requests is dealt with separately.
If this behavior doesn't suit you then you can use a regular Thread. It will start running right away and compete with other threads in your program (and the operating system) for CPU time. Creating 30,000 of such threads is not possible, there isn't enough virtual memory available for that. A 32-bit operating system poops out somewhere south of 2000 threads, consuming all available virtual memory. You can get about 50,000 threads on a 64-bit operating system before the paging file runs out. Testing these limits in a production program is not recommended.

I think you may have misunderstood the use of the threadpool. Spawning threads and killing threads involves the Windows Kernel and is an expensive operation. If you continuously need threads to perform an aynchronous operation and then you throw them away it would perform many system calls.
So the threadpool is actually a group of threads which are created once which instead of exiting when they complete their task actually enter a wait for another item for queueuserworkitem. The threadpool will then tune itself based on how many threads are required concurrently for your process. If you wish to test this write this code:
for(int i = 0; i < 30000; i++)
{
ThreadPool.QueueUserWorkItem(myMethod);
}
You will see this will create a whole bunch of threads. Maybe not 30000 as some of the threads that are created will be reused as the ThreadPool starts to work through your function calls.

The threadpool is there so you can avoid creating a thread for every asynchronous operation for the very reason that threads are expensive. If you want 30,000 threads you're going to use a lot of memory for the thread stacks plus waste a lot of CPU time doing context switches. Now creating that many threads would be justified if you had 30,000 CPU cores...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js