MPI_Barrier in different threads, behaviour? [duplicate]

MPI_Barrier in different threads, behaviour? [duplicate] - c++

This question already has an answer here:
Does a call to MPI_Barrier affect every thread in an MPI process?
(1 answer)
Closed 9 years ago.
Lets say I have 2 processes each with two threads (1 IO thread, 1 compute thread)
I am interessted in using some IO library (adios).
I am asking me what will happend if I would code something like this?:
lets say the IO threads in the 2 processes do some IO and they use
MPI_Barrier(MPI_COMM_WORLD) at some point B to synchronize the
IO!
the compute threads in the two processes also use the MPI_Barrier(MPI_COMM_WORLD) at some point A to synchronize the computation (while the IO threads are working).
---> I dont know exactly what might happen, is the following case possible:
Process 1, IO Thread waits at B
Process 2, Compute thread waits at A
=> and Process 1 and 2 get synchronized (so Process 1 leaves barrier at B and process 2 at A (also process 2 has not the same point where it synchronizes!)
If that might happen, isn't this an unwanted behavior which was not intended by the programmer. (Can that be avoided by using two different communicator with identical number of processes (MPI_Comm_dup(...) ) ?
Or is the barrier really code line dependent? But how is this realized if true so?
This is confusing!
Thanks a lot!

The first scenario is very likely to happen (barrier calls from different threads matching each other). From MPI's point of view a barrier must be entered by all ranks inside the communicator, no matter from which thread comes the barrier call and at which code line the call is. MPI still has no notion of thread identity and all threads are treated together as a single entity - a rank. The only special treatment is that when the MPI_THREAD_MULTIPLE thread support level is being provided, the library should implement proper locks so that MPI calls could be made from any thread and at any time.
That's why it is highly advisable that parallel library authors should always duplicate the world communicator and use the duplicate for internal communication needs. That way the library code won't interfere with the user code (with some special exceptions that could result in deadlocks).

Related

When threads are created, do they run in parallel or concurrently?

I’m learning about multi threading and I understand the difference between parallelism and concurrency. My question is, if I create a second thread and detach it from the parent thread, do these 2 run concurrently or in parallel?

Typically (for a modern OS) there's about 100 processes with maybe an average of 2 threads each for a total of 200 threads just for background services, GUI, software updaters, etc. You start your process (1 more thread) and it spawns its second thread, so now there's 202 threads.
Out of those 202 threads almost all of them will be blocked waiting for something to happen most of the time. If 30 threads are not blocked, and you have 8 CPUs, then 30 threads compete for those 8 CPUs.
If 30 threads compete for 8 CPUs, then maybe 4 threads are high priority and get a whole CPU for themselves and 10 threads are low priority and don't get any CPU time because there's more important work for CPUs to do; and maybe 12 threads are medium priority and share 4 CPUs by time multiplexing (frequently switching between threads). How this actually works depends on the OS and its scheduler (its very different for different operating systems).
Of course the number of threads and their priorities changes often (e.g. as threads are created and terminated), and threads block and unblock very often, so how many threads are competing for CPUs (not blocked) is constantly changing. Maybe there's 30 threads competing for 8 CPUs at one point in time, and 2 milliseconds later there's 5 threads and 3 CPUs are idle.
My question is, if I create a second thread and detach it from the parent thread, do these 2 run concurrently or in parallel?
Yes; your 2 threads may either run concurrently (share a CPU with time multiplexing) or run in parallel (on different CPUs); and can do both at different times (concurrently for a while, then parallel for a while, then..).

if I create a second thread and detach it...
Detach is not something your program can do to a thread. It's merely something the program can do to a std::thread object. The object is not the thread. The object is just a handle that your program can use to talk about the thread, and "detach" just grants permission for the program to destroy the handle, while the thread continues to run.
...from the parent thread
So, "detach" doesn't detach one thread from another (e.g., a "child" from its "parent,") It detaches a thread from its std::thread handle.
FYI: Most programming environments do not recognize any parent/child relationship between threads. If thread A creates thread B, that does not give thread A any special privileges or capabilities with respect to thread B that threads C, D, and E don't also have.
That's not to say that you can't recognize the relationship. It might mean something in your program. It just doesn't mean anything to the OS.
...Parallel or concurrently
That's not an either-or choice. Parallel is concurrent.
"Concurrency" does not mean "threads context switching." Rather, it's a statement about the order in which the threads access and update shared variables.
When two threads run concurrently on typical multiprocessing hardware, their actions will be serializable. That means that the outcome of the program will be the same as if some single, imaginary thread did all the same things that the real program threads did, and it did them one-by-one, in some particular order.
Two threads are concurrent with each other if the serialization is non-deterministic. That is to say, if the "particular order" in which the things all seem to happen is not entirely determined by the program. They are concurrent if different runs of the program can behave as if the imaginary single thread chose different serializations.
There's also a simpler test, which usually amounts to the same thing: Two threads probably are concurrent with each other if both threads are started before either one of them finishes.
Any way you look at it though, if two threads are truly running in parallel with each other, then they must also be concurrent with each other.
Do they run in parallel?
#Brendan already answered that one. TLDR: if the computer has more than one CPU, then they potentially can run in parallel. How much of the time they actually spend running in parallel depends on many things though, and many of those things are in the domain of the operating system.

Running only a given amount of threads at any given time in C++

I have a (let's say) function in C++ that takes random time to return a result (from 0.001 to 10 seconds). I would like to run this function N~10^5 independent times. So, it is natural to run lots of threads.
My question, is how can I run only 10 threads at a time. By this I mean that I would like to start 10 threads, and launching a new one only when another finishes. I tried launching 10^5 threads, and let the computer figure it out, but it doesn't work. Any suggestions?

You can either use std::async and let the system manage the number of threads.
Or you can, set a limited number of threads (a thread pool) - preferably using std::thread::hardware_concurrency; There are many ways to do this...
You need to start reading documentation of:
std::promise, std::future (std::shared_future)
std::packaged_task
C++ Concurrency In Action has an excellent chapter 9 on thread pools.
One good advanced idea is to make a thread-safe function queue (the simplest such queue can be obtained by wrapping std::queue in a class with much reduced interface that locks an std::mutex before pushing and popping/returning the popped value) to which you push your functions in advance... Then have your threads pull from the queue and execute the functions.

Thread basic..Help Required [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
what is thread
the difference between cases with using mutex and without using mutex
difference between using join() method and without using join()
which low-level functions is called when you create thread with std::thread class constructor and with using pthread.
I have read the material on the internet and still I am asking the question just for further strengthen my ideas.
Thanks in advance

1) A thread allows for parallel execution of your program. Using multiple threads in your program allows multiple processor cores to execute your code and thus (usually) speeding up the program.
2) Because threads allows parellel execution of code it can happen that thread #1 is reading data while thread #2 is modifying this data, this can result in some funky cases you don't want to happen. Mutexes stop this behaviour by making threads wait their turn in these particular critical sections.
3) using thread.join() makes the current thread wait for the completion of thread object that's been called join() upon.
4) This is really OS specific. For example, Unix based systems use pthread as the underlying thread class when creating a std::thread. The compiler vendor implements this.

If you would like to learn multi threading using with C++ standard library then please refer C++ concurrency IN Action(Author Williams). It is very good book also referred on The Definitive C++ Book Guide and List
what is thread
Thread is an execution unit which consists of its own program counter, a stack, and a set of registers. Thread are implemented in application to improve the performance and effective use of CPU's
The CPU switches rapidly back and forth among the threads giving illusion that the threads are running in parallel.
Refer https://en.wikipedia.org/wiki/Thread_%28computing%29
the difference between cases with using mutex and without using mutex
Imagine for a moment that you’re sharing an apartment with a friend. There’s
only one kitchen and only one bathroom. Unless you’re particularly friendly, you can’t both use the bathroom at the same time, and if your roommate occupies the bathroom for a long time, it can be frustrating if you need to use it.
Likewise, though it might be possible to both cook meals at the same time, if you have a combined oven and grill, it’s just not going to end well if one of you tries to grill some sausages at the same time as the other is baking a cake. Furthermore, we all know the frustration of sharing a space and getting halfway through a task only to find that someone has borrowed something we need or changed something from the way we left it.
It’s the same with threads. If you’re sharing data between threads, you need to have rules for which thread can access which bit of data when, and how any updates
When you have a multi-threaded application, the different threads sometimes share a common resource, such as a global variables, file handler or any.
Mutex can be used with single resource for synchronization. Other synchronization methods(like semaphore) available to synchronize between multiple threads and process.
The concept is called "mutual exclusion" (short Mutex), and is a way to ensure that only one thread is allowed inside critical area, using that resource etc.
difference between using join() method and without using join()
calling thread waits for the thread specified by thread to terminate. If that thread has already terminated, then join() returns immediately. The thread specified by thread must be joinable.
By default thread are joinable if you not changed its attribute.
which low-level functions is called when you create thread with std::thread class constructor and with using pthread.
pthread_create is called in case of linux. std thread library is platform independent. So it call different thread API specific to the underlying operating system.

How to use FFTW in multithreading?

I am using two Boost threads, each of which uses different FFTW plan (example: thread 1 uses 'plan_fft' and thread 2 uses 'plan_ifft'). When I run only one thread (thread 2), it works perfectly, but when I run both threads, then I am getting a segmentation fault. I think it may be because of creation plan is not thread safe. It would be great help for me if someone provides solution to "how to use two different fftw_plans (each in one thread) in two threads in a parallel manner".
I forgot to mention one thing as solutions provided by FFTW multithreading developers:
using semaphore locks
creating all the plans in the one thread
I implemented the 2nd one (i.e created all the plans in main program and then called two threads from the main program). When I do so, there are no errors and segmentation fault, but I am not getting the result.
Please note: these two threads are independent and not sharing any common data, so I think a semaphore lock won't work for my case.
My doubt: can we create (and destroy) plans in the main program and execute these two different plans in two different threads?

The FFTW folks provide a nice summary to the thread safety topic here. Wrapup: nothing is thread safe except for fftw_execute, so you have to take care that e.g. only a single thread creates plans. However, it should be no problem to execute them in parallel.

MPI with thread support and Bcast calls

I was wondering if in an MPI program where you specify that there is thread support, all the threads make an MPI::Bcast call (making shure that in the call, the sender process only possesses one thread), is this received by all the other threads or just for one thread from each process (the fastest)?

Common MPI implementations deal with communication among processes. Implementations supporting threads simply allow multiple threads to make some or all MPI calls, rather than just one. Every one of T threads in a process calling MPI_Bcast means that the process has called MPI_Bcast T times, and expects that all of the other ranks on the communicator will do the same.

Depending on the level of thread support in your implementation of MPI, (please check, threading support in MPI is very sketchy), MPI call is made only once per process.

To add to the answer given by Novelocrat:
The basic unit of computation in MPI is the "rank." In most (all?) interesting implementations of MPI, a rank IS a process. All of the threads within a process share the same Rank ID.
The MPI Standard supports multiple levels of thread parallelism: MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, MPI_THREAD_SERIALIZED, and MPI_THREAD_MULTIPLE.
Of these, only MPI_THREAD_MULTILE actually has multiple threads making overlapping calls into the MPI library. The other three cases are an assertion from the application that the Rank can be treated as if it were "single threaded." For more, see the MPI Standard entry on MPI_INIT_THREAD.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js