MPI distributed, unordered work - c++

I would like to write a MPI program where the master thread continuously submits new job to the workers (i.e. not just at the start, like in the MapReduce pattern).
Initially, lets say, I submit 100 jobs onto 100 workers.
Then, I would like to be notified when a worker is finished with a job. I would send the next job off, whose parameters depend on all the results received so far. The order of results does not have to be preserved, I would just need them as they finish.
I can work with C/C++/Python.
From the documentation, it seems like I can broadcast N jobs, and gather the results. But this is not what I need, as I don't have all of them available, and gather would block. I am looking for a asynchronous, any-worker recv call, essentially.

You can use MPI_ANY_SOURCE and MPI_ANY_TAG for receiving from anywhere. After receiving you can read the Information (source and tag) out of the MPI_Status structure that has to passed to the MPI_Recv call.
If you use this you do not neccessary need any asynchronous communication, since the master 'listens' to everybody asking for new jobs and returning results; and each slave does his task and then sends the result to the master asks for new work and waits for the answer from the master.
You should not have to work with scatter/gather at all since those are ment for use on an array of data and your problem seems to have more or less independant tasks.

Related

How to implement long running gRPC async streaming data updates in C++ server

I'm creating an async gRPC server in C++. One of the methods streams data from the server to clients - it's used to send data updates to clients. The frequency of the data updates isn't predictable. They could be nearly continuous or as infrequent as once per hour. The model used in the gRPC example with the "CallData" class and the CREATE/PROCESS/FINISH states doesn't seem like it would work very well for that. I've seen an example that shows how to create a 'polling' loop that sleeps for some time and then wakes up to check for new data, but that doesn't seem very efficient.
Is there another way to do this? If I use the "CallData" method can it block in the 'PROCESS' state until there's data (which probably wouldn't be my first choice)? Or better, can I structure my code so I can notify a gRPC handler when data is available?
Any ideas or examples would be appreciated.
In a server-side streaming example, you probably need more states, because you need to track whether there is currently a write already in progress. I would add two states, one called WRITE_PENDING that is used when a write is in progress, and another called WRITABLE that is used when a new message can be sent immediately. When a new message is produced, if you are in state WRITABLE, you can send immediately and go into state WRITE_PENDING, but if you are in state WRITE_PENDING, then the newly produced message needs to go into a queue to be sent after the current write finishes. When a write finishes, if the queue is non-empty, you can grab the next message from the queue and immediately start a write for it; otherwise, you can just go into state WRITABLE and wait for another message to be produced.
There should be no need to block here, and you probably don't want to do that anyway, because it would tie up a thread that should otherwise be polling the completion queue. If all of your threads wind up blocked that way, you will be blind to new events (such as new calls coming in).
An alternative here would be to use the C++ sync API, which is much easier to use. In that case, you can simply write straight-line blocking code. But the cost is that it creates one thread on the server for each in-progress call, so it may not be feasible, depending on the amount of traffic you're handling.
I hope this information is helpful!

Threading - The fastest way to handle reoccuring threads?

I am writing my first threaded application for an industrial machine that has a very fast line speed. I am using the MFC for the UI and once the user pushes the "Start" machine button, I need to be simultaneously executing three operations. I need to collect data, process it and output results very quickly as well as checking to see if the user has turned the machine "off". When I say very quickly, I expect the analyze portion of the execution to take the longest and it needs to happen in well under a second. I am mostly concerned about overhead elimination associated with threads. What is the fastest way to implement the loop below:
void Scanner(CString& m_StartStop) {
std::thread Collect(CollectData);
while (m_StartStop == "Start") {
Collect.join();
std::thread Analyze(AnalyzeData);
std::thread Collect(CollectData);
Analyze.join();
std::thread Send(SendData);
Send.join();
}
}
I realize this sample is likely way off base, but hopefully it gets the point across. Should I be creating three threads and suspending them instead of creating and joining them over and over? Also, I am a little unclear if the UI needs its own thread since the user needs to able to pause or stop the line at anytime.
In case anyone is wondering why this needs to be threaded as opposed to sequential, the answer is that the line speed of the machine will cause the need to be collecting data for the second part while the first part is being analyzed. Every 1 second equates to 3 ft of linear part movement down this machine.
Think about functionnal problem before thinking about implementation.
So we have a continuous flow of data that need to be collected, analyzed and sent elsewhere, with a supervision point to be able to stop of pause the process.
collection should be limited by the input flow
analyze should only be cpu limited
sending should be io bound
You just need to make sure that the slowest part must be collection.
That is a correct use case for threads. Implementation could use:
a pool of input buffers that would be filled by collect task and used by analyze task
one thread that continuously:
controls if it should exit (a dedicated variable)
takes an input object from the pool
fills it with data
passes it to analyze task
one thread that continuously
waits for the first of an input object from collect task and a request to exit
analyzes the object and prepares output
send the output
Optionnaly, you can have a separate thread for processing the output. In that case, the last lines becomes
passes an output object to the sending task
and we must add:
one thread that continuously
waits for the first of an output object from analze task and a request to exit
send the output
And you must provide a way to signal the request for pause or exit, either with a completely external program and a signalisation mechanism, or a GUI thread
Any threads you need should already be running, waiting for work. You should not create or join threads.
If job A has to finish before job B can start, the completion of job A should trigger the start of job B. That is, when the thread doing job A finished doing job A, it should either do job B itself or trigger the dispatch of job B. There shouldn't need to be some other thread that's waiting for job A to finish so that it can start job B.

What happens to running processes on a continuous Azure WebJob when website is redeployed?

I've read about graceful shutdowns here using the WEBJOBS_SHUTDOWN_FILE and here using Cancellation Tokens, so I understand the premise of graceful shutdowns, however I'm not sure how they will affect WebJobs that are in the middle of processing a queue message.
So here's the scenario:
I have a WebJob with functions listening to queues.
Message is added to Queue and job begins processing.
While processing, someone pushes to develop, triggering a redeploy.
Assuming I have my WebJobs hooked up to deploy on git pushes, this deploy will also trigger the WebJobs to be updated, which (as far as I understand) will kick off some sort of shutdown workflow in the jobs. So I have a few questions stemming from that.
Will jobs in the middle of processing a queue message finish processing the message before the job quits? Or is any shutdown notification essentially treated as "this bitch is about to shutdown. If you don't have anything to handle it, you're SOL."
If we are SOL, is our best option for handling shutdowns essentially to wrap anything you're doing in the equivalent of DB transactions and implement your shutdown handler in such a way that all changes are rolled back on shutdown?
If a queue message is in the middle of being processed and the WebJob shuts down, will that message be requeued? If not, does that mean that my shutdown handler needs to handle requeuing that message?
Is it possible for functions listening to queues to grab any more queue messages after the Job has been notified that it needs to shutdown?
Any guidance here is greatly appreciated! Also, if anyone has any other useful links on how to handle job shutdowns besides the ones I mentioned, it would be great if you could share those.
After no small amount of testing, I think I've found the answers to my questions and I hope someone else can gain some insight from my experience.
NOTE: All of these scenarios were tested using .NET Console Apps and Azure queues, so I'm not sure how blobs or table storage, or different types of Job file types, would handle these different scenarios.
After a Job has been marked to exit, the triggered functions that are running will have the configured amount of time (grace period) (5 seconds by default, but I think that is configurable by using a settings.job file) to finish before they are exited. If they do not finish in the grace period, the function quits. Main() (or whichever file you declared host.RunAndBlock() in), however, will finish running any code after host.RunAndBlock() for up to the amount of time remaining in the grace period (I'm not sure how that would work if you used an infinite loop instead of RunAndBlock). As far as handling the quit in your functions, you can essentially "listen" to the CancellationToken that you can pass in to your triggered functions for IsCancellationRequired and then handle it accordingly. Also, you are not SOL if you don't handle the quits yourself. Huzzah! See point #3.
While you are not SOL if you don't handle the quit (see point #3), I do think it is a good idea to wrap all of your jobs in transactions that you won't commit until you're absolutely sure the job has ran its course. This way if your function exits mid-process, you'll be less likely to have to worry about corrupted data. I can think of a couple scenarios where you might want to commit transactions as they pass (batch jobs, for instance), however you would need to structure your data or logic so that previously processed entities aren't reprocessed after the job restarts.
You are not in trouble if you don't handle job quits yourself. My understanding of what's going on under the covers is virtually non-existent, however I am quite sure of the results. If a function is in the middle of processing a queue message and is forced to quit before it can finish, HAVE NO FEAR! When the job grabs the message to process, it will essentially hide it on the queue for a certain amount of time. If your function quits while processing the message, that message will "become visible" again after x amount of time, and it will be re-grabbed and ran against the potentially updated code that was just deployed.
So I have about 90% confidence in my findings for #4. And I say that because to attempt to test it involved quick-switching between windows while not actually being totally sure what was going on with certain pieces. But here's what I found: on the off chance that a queue has a new message added to it in the grace period b4 a job quits, I THINK one of two things can happen: If the function doesn't poll that queue before the job quits, then the message will stay on the queue and it will be grabbed when the job restarts. However if the function DOES grab the message, it will be treated the same as any other message that was interrupted: it will "become visible" on the queue again and be reran upon the restart of the job.
That pretty much sums it up. I hope other people will find this useful. Let me know if you want any of this expounded on and I'll be happy to try. Or if I'm full of it and you have lots of corrections, those are probably more welcome!

Non-blocking data sharing through OpenMPI

I'm trying to spread data across multiple workers using OpenMPI, however, I'm doing the data division in a fairly custom way that is not amenable to MPI_Scatter or MPI_Broadcast. What I would like to do is to give each processor some work in a queue (or, some other async mechanism) such that they can do their work on the first chunk of data, take the next chunk, repeat until no more chunks.
I know of MPI_Isend, however if I send data with MPI_Isend I can't modify it until it's finished sending; forcing me to use MPI_Wait and thus having to wait until the thread is finished receiving the data anyway!
Is there a standard a solution to this problem, or must I rethink my approach?
Using MPI_ISEND doesn't necessarily mean that the message is received on the remote end. It just means that the buffer is available for reuse. It could be that the message has been buffered internally by Open MPI or that the message actually has been received on the other end. It depends on your message size.
Another option would be to have your workers ask the master process for work when they need it instead of having it pushed to them. Then the master can work only as needed. You could do an MPI_SCATTER for the first message since everyone will be receiving some data. Then after that, have the master do an MPI_RECV(MPI_ANY_SOURCE) to get a message from one of the worker processes.

Unbalanced load (v2.0) using MPI

(the problem is embarrassingly parallel)
Consider an array of 12 cells:
|__|__|__|__|__|__|__|__|__|__|__|__|
and four (4) CPUs.
Naively, I would run 4 parallel jobs and feeding 3 cells to each CPU.
|__|__|__|__|__|__|__|__|__|__|__|__|
=========|========|========|========|
1 CPU 2 CPU 3 CPU 4 CPU
BUT, it appears, that each cell has different evaluation time, some cells are evaluated very quickly, and some are not.
So, instead of wasting "relaxed CPU", I think to feed EACH cell to EACH CPU at time and continue until the entire job is done.
Namely:
at the beginning:
|____|____|____|____|____|____|____|____|____|____|____|____|
1cpu 2cpu 3cpu 4cpu
if, 2cpu finished his job at cell "2", it can jump to the first empty cell "5" and continue working:
|____|done|____|____|____|____|____|____|____|____|____|____|
1cpu 3cpu 4cpu 2cpu
|-------------->
if 1cpu finished, it can take sixth cell:
|done|done|____|____|____|____|____|____|____|____|____|____|
3cpu 4cpu 2cpu 1cpu
|------------------------>
and so on, until the full array is done.
QUESTION:
I do not know a priori which cell is "quick" and which cell is "slow", so I cannot spread cpus according to the load (more cpus to slow, less to quick).
How one can implement such algorithm for dynamic evaluation with MPI?
Thanks!!!!!
UPDATE
I use a very simple approach, how to divide the entire job into chunks, with IO-MPI:
given: array[NNN] and nprocs - number of available working units:
for (int i=0;i<NNN/nprocs;++i)
{
do_what_I_need(start+i);
}
MPI_File_write(...);
where "start" corresponds to particular rank number. In simple words, I divide the entire NNN array into fixed size chunk according to the number of available CPU and each CPU performs its chunk, writes the result to (common) output and relaxes.
IS IT POSSIBLE to change the code (Not to completely re-write in terms of Master/Slave paradigm) in such a way, that each CPU will get only ONE iteration (and not NNN/nprocs) and after it completes its job and writes its part to the file, will Continue to the next cell and not to relax.
Thanks!
There is a well known parallel programming pattern, known under many names, some of which are: bag of tasks, master / worker, task farm, work pool, etc. The idea is to have a single master process, which distributes cells to the other processes (workers). Each worker runs an infinite loop in which it waits for a message from the master, computes something and then returns the result. The loop is terminated by having the master send a message with a special tag. The wildcard tag value MPI_ANY_TAG can be used by the worker to receive messages with different tags.
The master is more complex. It also runs a loop but until all cells have been processed. Initially it sends each worker a cell and then starts a loop. In this loop it receives a message from any worker using the wildcard source value of MPI_ANY_SOURCE and if there are more cells to be processed, sends one of them to the same worker that have returned the result. Otherwise it sends a message with a tag set to the termination value.
There are many many many readily available implementations of this model on the Internet and even some on Stack Overflow (for example this one). Mind that this scheme requires one additional MPI process that often does very little work. If this is unacceptable, one can run a worker loop in a separate thread.
You want to implement a kind of client-server architecture where you have workers asking the server for work whenever they are out of work.
Depending on the size of the chunks and the speed of your communication between workers and server, you may want to adjust the size of the chunks sent to workers.
To answer your updated question:
Under the master/slave (or worker pool if that's how you prefer it to be labelled) model, you will basically need a task scheduler. The master should have information about what work has been done and what still needs to be done. The master will give each process some work to be done, then sit and wait until a process completes (using nonblocking receives and a wait_all). Once a process completes, have it send the data to the master then wait for the master to respond with more work. Continue this until the work is done.