MPI Bcast and the number of processes

MPI Bcast and the number of processes - concurrency

What is the semantics of Bcast when number of processes during this call is different than number of processes spawned at the beginning of a computation?
I need to handle a situation where user specifies too many processes that are necessary to perform computation. For example, user may decide to spawn 16 processes with mpirun, when I only need 12 to split the problem among processes. I'm handling this situation by comparing PID to 12 and ending a processes with MPI_Finalize, when PID is too high. I think this causes a deadlock in my application, because Bcast wants to send to all processes?
How to handle that? Should I just invoke Bcast in all processes, but just ignore the output in some?

Given that we receive, what could be considered, invalid input from the user, must we really continue program execution after realizing this? Isn't is better to display an error message to the user, saying that an invalid number of processes was requested, and also informing the user about the allowed interval (e.g. "16 processes was requested, however, the maximum number of processes are 12, the program will therefore now exit").
Otherwise, if this isn't a possible solution in your situation, chapter 6 "Groups, Contexts, Communicators, and Caching" and/or chapter 10 "Process Creation and Management" from the MPI 2.2 documentation might be of help. There is probably other, easier-to-read, documentation available somewhere else, but at least it's a start.

At program startup, every process should look at its own rank (from MPI_Comm_rank), the total number of processes (from MPI_Comm_size(MPI_COMM_WORLD)), and the number the computation actually needs. Have the ranks below the number you need create a new communicator that you'll actually use to do your work, and have all the remaining ranks just call MPI_Finalize.

Related

Making a timer go off at a specific time in a C++ process for purposes of synchronizing two processes

I have two processes in C++ (these are not parent and child processes). Each has been pinned to a specific core using taskset. So, for instance, process 1 is pinned to core 0 and process 2 is pinned to core 1. I want to start running them at the same exact time (with microsecond accuracy).
timer_create allows some code to execute with a specified frequency. However, it doesn't provide a mechanism for it to start at a specific time. What would be the best way to configure a start time such that a section of code in both processes starts off at that particular time, and then repeats with a certain frequency?

best ways are going to be very system and os dependent.
The best somewhat generic way I can think of is to block both tasks on a counting semaphore, and use a third task to wait on a timer and trigger the semaphore at the desired rate. the degree of precision you want will be difficult if not using a rtos.

synchronize between process with multiple threads in cpp

I know how to synchronize between process and how to synchronize between threads. But I don't know how to synchronize between process with multiple threads.
Consider a scenario
I am creating 10 threads in a process. Each thread is printing a number.
I am running 10 processes simultaneously.
Now I need to create synchronization between process and threads such that at last, it should print 100 unique numbers in the console.Kindly note i don't expect the numbers printed in console to be in sequence , but all 100 numbers printed should be unique.
Kindly note that this idea should not only be limited to the above task. It may happen that I want to only one thread of one of the processes to execute certain code.
How to write this program in CPP for redhat.
Kindly note the version: gcc version 4.4.7.

MPI_Barrier in different threads, behaviour? [duplicate]

This question already has an answer here:
Does a call to MPI_Barrier affect every thread in an MPI process?
(1 answer)
Closed 9 years ago.
Lets say I have 2 processes each with two threads (1 IO thread, 1 compute thread)
I am interessted in using some IO library (adios).
I am asking me what will happend if I would code something like this?:
lets say the IO threads in the 2 processes do some IO and they use
MPI_Barrier(MPI_COMM_WORLD) at some point B to synchronize the
IO!
the compute threads in the two processes also use the MPI_Barrier(MPI_COMM_WORLD) at some point A to synchronize the computation (while the IO threads are working).
---> I dont know exactly what might happen, is the following case possible:
Process 1, IO Thread waits at B
Process 2, Compute thread waits at A
=> and Process 1 and 2 get synchronized (so Process 1 leaves barrier at B and process 2 at A (also process 2 has not the same point where it synchronizes!)
If that might happen, isn't this an unwanted behavior which was not intended by the programmer. (Can that be avoided by using two different communicator with identical number of processes (MPI_Comm_dup(...) ) ?
Or is the barrier really code line dependent? But how is this realized if true so?
This is confusing!
Thanks a lot!

The first scenario is very likely to happen (barrier calls from different threads matching each other). From MPI's point of view a barrier must be entered by all ranks inside the communicator, no matter from which thread comes the barrier call and at which code line the call is. MPI still has no notion of thread identity and all threads are treated together as a single entity - a rank. The only special treatment is that when the MPI_THREAD_MULTIPLE thread support level is being provided, the library should implement proper locks so that MPI calls could be made from any thread and at any time.
That's why it is highly advisable that parallel library authors should always duplicate the world communicator and use the duplicate for internal communication needs. That way the library code won't interfere with the user code (with some special exceptions that could result in deadlocks).

How do I determine total number of child threads?

I am working on a linux server programming and I am not sure the proper number of child threads for my thread pool. These threads are doing the actual work including parsing and data processing and more. If my server has 8 cores then what is the proper number of child threads that I should spawn? Thanks in advance..

If you question is about how many threads you actually have, then man ps will give you a number of options for showing the running threads as they were processes -- all you need to do is count them.
If your question is about how many threads you should create, then that entirely depends on your application and what is does. If every thread is free of IO and and syncronouz calls, and all it does is number crunching -- then you should probably have no more than 8 threads (one for each core). However if your application is performing anything which would cause it to wait for external IO, then you could benefit from more. And I say could as it entirely depends on how your application is build -- in most cases you would need to experiment; which typically would be done by testing first with 8 threads and see how many requests per second you can handle, and then increase the number of threads and run the experiment again. Then plot the results as a curve and find the sweet spot where extra threads add no (or little) additional value -- and that would be the thread count you should configure. This is of-cause assuming that you are not causing other bottle necks such as memory / swapping, which you would want to avoid

Pthread Queue System

I'm working on my assignment on pthreads. I'm new and never touched on pthreads before. Is there any sample codes or resources out there that anyone of you have, that might aid me in my assignment?
Here are my assignment details. A pthread program about queue system:
Write a C/C++ Pthread program for a Dental clinic’s queuing system that declares an
array of integers of size N, where N is the maximum number of queue for the day. The
pthread program uses two threads.
Whenever there is a new dental appointment, the first
thread (the creator) puts the queue numbers in the array, one after the other. The
second thread (the remover) removes the queue numbers from the array whenever the
dentist has seen the patient. This is done in a FIFO fashion (First In First Out).
The algorithm of the creator is as follows:
• If the array is not full then put a new number in it (the numbers start at 1
and are incremented by one each time, so the creator create queue number 1, 2, 3 etc.)
• sleep for 1 to 10 seconds, randomly
• repeat
The algorithm of the remover is as follows:
• If the array is not empty then remove its smallest queue number.
• sleep for 1 to 10 seconds, randomly
• repeat
You should use mutex locks to protect things that must be protected. Each thread
should print on the screen what it is doing (eg: "number 13 is added into the queue",
"number 7 is removed from the queue", etc.). The program should run forever.
Any help will be appreciated.
Thanks.

For generally starting out with pthreads, this is a good website with possibly more info than you need (but I like detail). It runs through a lot of the basics for pthreads and more. If you prefer a dead-tree tutorial, this book is pretty good and gives you a good grounding in most of the features of the Linux API, or the core libraries, if you want to call it that. This stackoverflow question deals with mutexes vs semaphores pretty concisely.
Finally, I like this site for its converage of Linux Threading and Synchronisation.
Hopefully these give you some reading material. Work out how you handle threads, then how you synchronise them, then attack your problem.

This is the classic producer-consumer problem.
There are many ways to solve this, but the easiest thing is to have one lock on a queue, and when you add or remove an item from it, in the producer or consumer, respectively, lock the queue, do the work, and then unlock the queue. In the consumer, process the item, and in the producer, go on to the work of getting a new item.
You may wish to lay out your data structures, and then define locks, describing specifically what the lock locks, so you are sure all your data that both threads access stays synchronized.
Thanks for tagging this as homework; I hope this gets you started in the right direction. You may also wish to lock around things like printing to the console to ensure that these operations don't overlap.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

MPI Bcast and the number of processes - concurrency

Related

Making a timer go off at a specific time in a C++ process for purposes of synchronizing two processes

synchronize between process with multiple threads in cpp

MPI_Barrier in different threads, behaviour? [duplicate]

How do I determine total number of child threads?

Pthread Queue System

Categories

Resources