Using MPI, how can I synchronize the end of inter-dependent processes? - concurrency

Ok, the title sounds confusing but the concept is not too bad. Basically, I have two processes that are running (let's call them process 0 and process 1). They both run a function at the same time. While this function is running, sometimes they need data from each other. So process 0 sometimes requests data from process 1 and vice versa. Since they rely on each other, I don't want one process to finish before the other. If process 0 finishes its work, it should continue checking for requests from process 1 (otherwise process 1 won't be able to finish). After both processes have finished their work, only then should they proceed.
I'm having trouble implementing this. Right now, I have each process send all other processes a notification when it finishes (so process 1 sends a notification to process 2 when its work is done). Then I have a loop that is supposed to continue until it receives a notification from all the other processes. Only then should the loop exit and the process continue. However, this isn't working. The processes keep going before the others have finished. I feel like there's probably a much simpler way to do this that I'm not thinking of.
I'm a complete newbie to MPI, so I hope I've explained this properly. Also, this needs to work for any number of processes, not just two. Thanks for your help!

Related

How to wait for unknown number of processes to end

The scenario:
There are several processes running on a machine. Names and handles unknown, but they all have a piece of code running in them that's under our control.
A command line process is run. It signals to the other processes that they need to end (SetEvent), which our code picks up and handles within the other processes.
The goal:
The command line process needs to wait until the other processes have ended. How can this be achieved?
All that's coming to mind is to set up some shared memory or something and have each process write its handle into it so the command line process can wait on them, but this seems like so much effort for what it is. There must be some kernel level reference count that can be waited on?
Edit 1:
I'm thinking maybe assigning the processes to a job object, then the command line processes can wait on that? Not ideal though...
Edit 2:
Can't use job objects as it would interfere with other things using jobs. So now I'm thinking that the processes would obtain a handle to some/any sync object (semaphore, event, etc), and the command line process would poll for its existance. It would have to poll as if it waited it would keep the object alive. The sync object gets cleaned up by windows when the processes die, so the next poll would indicate that there are no processes. Not the niceset, cleanest method, but simple enough for the job it needs to do. Any advance on that?
You can do either of following ways.
Shared Memory (memory mapped object) : CreateFileMapping, then MapViewOfFile --> Proceed the request. UnmapViewFile. Close the file,
Named Pipe : Create a nameed pipe for each application. And keep running a thread to read the file. So, You can write end protocol from your application by connecting to that named pipe. ( U can implement a small database as like same )
WinSock : (Dont use if you have more number of processes. Since you need to send end request to the other process. Either the process should bind to your application or it should be listening in a port.)
Create a file/DB : Share the file between the processes. ( You can have multiple files if u needed ). Make locking before reading or writing.
I would consider a solution using two objects:
a shared semaphore object, created by the main (controller?) app, with an initial count of 0, just before requesting the other processes to terminate (calling SetEvent()) - I assume that the other processes don't create this event object, neither they fail if it has not been created yet.
a mutex object, created by the other (child?) processes, used not for waiting on it, but for allowing the main process to check for its existence (if all child processes terminate it should be destroyed). Mutex objects have the distinction that can be "created" by more than one processes (according to the documentation).
Synchronization would be as follows:
The child processes on initialization should create the Mutex object (set initial ownership to FALSE).
The child processes upon receiving the termination request should increase the semaphore count by one (ReleaseSemaphore()) and then exit normally.
The main process would enter a loop calling WaitForSingleObject() on the semaphore with a reasonably small timeout (eg some 250 msec), and then check not whether the object was granted or a timeout has occurred, but whether the mutex still exists - if not, this means that all child processes terminated.
This setup avoids making an interprocess communication scheme (eg having the child processes communicating their handles back - the number of which is unknown anyway), while it's not strictly speaking "polling" either. Well, there is some timeout involved (and some may argue that this alone is polling), but the check is also performed after each process has reported that it's terminating (you can employ some tracing to see how many times the timeout has actually elapsed).
The simple approach: you already have an event object that every subordinate process has open, so you can use that. After setting the event in the master process, close the handle, and then poll until you discover that the event object no longer exists.
The better approach: named pipes as a synchronization object, as already suggested. That sounds complicated, but it isn't.
The idea is that each of the subordinate processes creates an instance of the named pipe (i.e., all with the same name) when starting up. There's no need for a listening thread, or indeed any I/O logic at all; you just need to create the instance using CreateNamedPipe, then throw away the handle without closing it. When the process exits, the handle is closed automatically, and that's all we need.
To see whether there are any subordinate processes, the master process would attempt to connect to that named pipe using CreateFile. If it gets a file not found error, there are no subordinate processes, so we're done.
If the connection succeeded, there's at least one subordinate process that we need to wait for. (When you attempt to connect to a named pipe with more than one available instance, Windows chooses which instance to connect you to. It doesn't matter to us which one it is.)
The master process would then call ReadFile (just a simple synchronous read, one byte will do) and wait for it to fail. Once you've confirmed that the error code is ERROR_BROKEN_PIPE (it will be, unless something has gone seriously wrong) you know that the subordinate process in question has exited. You can then loop around and attempt another connection, until no more subordinate processes remain.
(I'm assuming here that the user will have to intervene if one or more subordinates have hung. It isn't impossible to keep track of the process IDs and do something programmatically if that is desirable, but it's not entirely trivial and should probably be a separate question.)

Threading - The fastest way to handle reoccuring threads?

I am writing my first threaded application for an industrial machine that has a very fast line speed. I am using the MFC for the UI and once the user pushes the "Start" machine button, I need to be simultaneously executing three operations. I need to collect data, process it and output results very quickly as well as checking to see if the user has turned the machine "off". When I say very quickly, I expect the analyze portion of the execution to take the longest and it needs to happen in well under a second. I am mostly concerned about overhead elimination associated with threads. What is the fastest way to implement the loop below:
void Scanner(CString& m_StartStop) {
std::thread Collect(CollectData);
while (m_StartStop == "Start") {
Collect.join();
std::thread Analyze(AnalyzeData);
std::thread Collect(CollectData);
Analyze.join();
std::thread Send(SendData);
Send.join();
}
}
I realize this sample is likely way off base, but hopefully it gets the point across. Should I be creating three threads and suspending them instead of creating and joining them over and over? Also, I am a little unclear if the UI needs its own thread since the user needs to able to pause or stop the line at anytime.
In case anyone is wondering why this needs to be threaded as opposed to sequential, the answer is that the line speed of the machine will cause the need to be collecting data for the second part while the first part is being analyzed. Every 1 second equates to 3 ft of linear part movement down this machine.
Think about functionnal problem before thinking about implementation.
So we have a continuous flow of data that need to be collected, analyzed and sent elsewhere, with a supervision point to be able to stop of pause the process.
collection should be limited by the input flow
analyze should only be cpu limited
sending should be io bound
You just need to make sure that the slowest part must be collection.
That is a correct use case for threads. Implementation could use:
a pool of input buffers that would be filled by collect task and used by analyze task
one thread that continuously:
controls if it should exit (a dedicated variable)
takes an input object from the pool
fills it with data
passes it to analyze task
one thread that continuously
waits for the first of an input object from collect task and a request to exit
analyzes the object and prepares output
send the output
Optionnaly, you can have a separate thread for processing the output. In that case, the last lines becomes
passes an output object to the sending task
and we must add:
one thread that continuously
waits for the first of an output object from analze task and a request to exit
send the output
And you must provide a way to signal the request for pause or exit, either with a completely external program and a signalisation mechanism, or a GUI thread
Any threads you need should already be running, waiting for work. You should not create or join threads.
If job A has to finish before job B can start, the completion of job A should trigger the start of job B. That is, when the thread doing job A finished doing job A, it should either do job B itself or trigger the dispatch of job B. There shouldn't need to be some other thread that's waiting for job A to finish so that it can start job B.

g_main_loop uses 100% CPU

I have built my first application using glibmm. I'm using a lot of threads as it does heavy processing. I have tried to follow the guidelines concerning multithreading, i.e. not doing any GUI updates from other threads than the one where g_main_loop is running.
I do a lot of graphics rendering in worker threads but I always only update a PixBuf which is later drawn by the widgets on_draw() from the main loop.
All was fine as long as the data I render was read from files. When I started streaming data from a server which I render at regular intervals then the problems started.
Every now and then, especially when executing multiple instances of my application simultaneously, I see that the main threads takes 100% CPU time. Running strace on the process shows that g_main_loop has ended up in an eternal loop calling poll:
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=10, events=POLLIN}, {fd=8, events=POLLIN}], 4, 100) = 1 ([{fd=10, revents=POLLIN}])
In proc I get this for file-descriptor 10: 10 -> socket:[1132750]
The poll always returns immediately as file-descriptor 10 has something to offer. This goes on forever so I assume that the file-descriptor is never read. The odd thing is that running 5 applications will almost always lead to all 5 ending up in the infinite poll loop after just a couple of minutes while running only instance one seems to work more than 30 minutes most of the times I try.
Why is this happening and is there any way to debug this?
My mistake was that I called queue_draw() from one of my worker threads. Given that the function is called "queue", I assumed it would queue a redraw which would later be executed by the g_main_loop. As it turned out, this was what broke the g_main_loop. I wish libgtkmm would have a little more detail about these multithreading restrictions in its reference manual.
My solution, to the problem was adding Glib::Dispatcher queueRedraw to my Widget and connecting it to the queue_draw() function:
queueRedraw.connect(sigc::mem_fun(*this, &MyWidgetClass::queue_draw))
Calling queueRedraw() signals the main thread to call the queue_draw() function.
I don't know if this is the best approach, but it solves the problem.

Broadcast message for all processes to exit(MPI)

[MPi-C++]
I made an application that under a specific condition it should close the application in all processes.
I tried to made it using root process but I want to send message to all other processes to terminate also. How can I make this???
There is no way to quit an MPI application cleanly on all processes without communication. That means, if you have a condition that occurs only on a subset of the processes of your MPI application (e.g. you have an error on one of processes), the only way to unilaterally quit the application is to call MPI_Abort. This will result in all MPI processes coming to an abrupt end, no matter where in the code each rank was at that moment. Since MPI_Abort is not a collective routine, it is not possible to perform any cleanup on any of the other ranks.
If you wish to have a clean exit, you need to regularly communicate between all ranks whether everything is still working on all ranks, or if it is time to quit. For example, you could regularly call MPI_Allreduce with MPI_SUM as the operation. If your exit condition is fulfilled on a process, make it send 1 as the data, otherwise make it send 0. Now you only need to check after the MPI_Allreduce if the sum is larger than 0, and if it is, quit your application in an orderly fashion.

Number of parallel instances of my process (app)

Is there some portable way to check the number of parallel instances of my app?
I have a c++ app (win32) where I need to know how often it was started. The problem is
that several user can start it parallel (terminal server), so i cannot search the "running process" list because I'm not able to access the the list of other users.
I tried it with Semaphore (boost & win32 CreateSemaphore)
It worked, but now I have the problem if the app crashes (Assertion or just kill the process) the counter is not changed. (rebooting helps)
Also manually removing/resetting the semaphore counter in my code is not possible because I don't know if somebody else is running my application.
Edited to add:
Suppose you have a license that lets you run 20 full-functionality copies of your program. Then you could have 20 mutexes, named MyProgMutex1 through MyProgMutex20. At startup, your program can loop through the mutexes. If it finds a spare mutex that it can take, it stops looping and enters full-functionality mode. If it loops through all the mutexes without being able to take any of them, then it enters reduced-functionality mode.
Original answer:
I assume you want to make sure that only one copy of your process runs at once. (Or, for Terminal Server, one copy of your process per login session).
Your named semaphore solution is close. The right way to do this is a named mutex. Use CreateMutex to make the mutex, then call WaitForSingleObject with a timeout of zero. If WaitForSingleObject returns WAIT_TIMEOUT, another copy of the process is running. If it returns WAIT_OBJECT_0 or WAIT_ABANDONED, then you are the only copy of the process. You need to keep the mutex handle open while your program runs - either call CloseHandle when your process is about to exit, or just deliberately leak the handle and rely on Window's built-in cleanup to release the handle for you when your process exits. Windows will automatically increment the mutex's counter when your process exits.
The only thing I can think of that mitigates the problem of crashed processes is a kind of “dead man’s switch”: each process needs to update its status in regular intervals. If a process fails to do this, it’s automatically discarded from the list of active processes.
This technique requires that one of the processes acts as a server which keeps tab of whether other processes have updated recently. If the server dies, then another process can take over. This, in turn, requires that each process tests whether there still is a server alive.
Alternatively, each process can be its own server and keep track locally. This may be easier to implement than server-switching.
You can broadcast message and other instances of your application should send some response. You count responses - you get number of instances.