Non-blocking data sharing through OpenMPI

Non-blocking data sharing through OpenMPI - c++

I'm trying to spread data across multiple workers using OpenMPI, however, I'm doing the data division in a fairly custom way that is not amenable to MPI_Scatter or MPI_Broadcast. What I would like to do is to give each processor some work in a queue (or, some other async mechanism) such that they can do their work on the first chunk of data, take the next chunk, repeat until no more chunks.
I know of MPI_Isend, however if I send data with MPI_Isend I can't modify it until it's finished sending; forcing me to use MPI_Wait and thus having to wait until the thread is finished receiving the data anyway!
Is there a standard a solution to this problem, or must I rethink my approach?

Using MPI_ISEND doesn't necessarily mean that the message is received on the remote end. It just means that the buffer is available for reuse. It could be that the message has been buffered internally by Open MPI or that the message actually has been received on the other end. It depends on your message size.
Another option would be to have your workers ask the master process for work when they need it instead of having it pushed to them. Then the master can work only as needed. You could do an MPI_SCATTER for the first message since everyone will be receiving some data. Then after that, have the master do an MPI_RECV(MPI_ANY_SOURCE) to get a message from one of the worker processes.

Related

How to implement long running gRPC async streaming data updates in C++ server

I'm creating an async gRPC server in C++. One of the methods streams data from the server to clients - it's used to send data updates to clients. The frequency of the data updates isn't predictable. They could be nearly continuous or as infrequent as once per hour. The model used in the gRPC example with the "CallData" class and the CREATE/PROCESS/FINISH states doesn't seem like it would work very well for that. I've seen an example that shows how to create a 'polling' loop that sleeps for some time and then wakes up to check for new data, but that doesn't seem very efficient.
Is there another way to do this? If I use the "CallData" method can it block in the 'PROCESS' state until there's data (which probably wouldn't be my first choice)? Or better, can I structure my code so I can notify a gRPC handler when data is available?
Any ideas or examples would be appreciated.

In a server-side streaming example, you probably need more states, because you need to track whether there is currently a write already in progress. I would add two states, one called WRITE_PENDING that is used when a write is in progress, and another called WRITABLE that is used when a new message can be sent immediately. When a new message is produced, if you are in state WRITABLE, you can send immediately and go into state WRITE_PENDING, but if you are in state WRITE_PENDING, then the newly produced message needs to go into a queue to be sent after the current write finishes. When a write finishes, if the queue is non-empty, you can grab the next message from the queue and immediately start a write for it; otherwise, you can just go into state WRITABLE and wait for another message to be produced.
There should be no need to block here, and you probably don't want to do that anyway, because it would tie up a thread that should otherwise be polling the completion queue. If all of your threads wind up blocked that way, you will be blind to new events (such as new calls coming in).
An alternative here would be to use the C++ sync API, which is much easier to use. In that case, you can simply write straight-line blocking code. But the cost is that it creates one thread on the server for each in-progress call, so it may not be feasible, depending on the amount of traffic you're handling.
I hope this information is helpful!

MPI distributed, unordered work

I would like to write a MPI program where the master thread continuously submits new job to the workers (i.e. not just at the start, like in the MapReduce pattern).
Initially, lets say, I submit 100 jobs onto 100 workers.
Then, I would like to be notified when a worker is finished with a job. I would send the next job off, whose parameters depend on all the results received so far. The order of results does not have to be preserved, I would just need them as they finish.
I can work with C/C++/Python.
From the documentation, it seems like I can broadcast N jobs, and gather the results. But this is not what I need, as I don't have all of them available, and gather would block. I am looking for a asynchronous, any-worker recv call, essentially.

You can use MPI_ANY_SOURCE and MPI_ANY_TAG for receiving from anywhere. After receiving you can read the Information (source and tag) out of the MPI_Status structure that has to passed to the MPI_Recv call.
If you use this you do not neccessary need any asynchronous communication, since the master 'listens' to everybody asking for new jobs and returning results; and each slave does his task and then sends the result to the master asks for new work and waits for the answer from the master.
You should not have to work with scatter/gather at all since those are ment for use on an array of data and your problem seems to have more or less independant tasks.

network delays and Application->ProcessMessages()

I am writing a networking DLL that I use in my C++Builder project. This DLL works with remote FTP servers. I noticed a strange behavior when recv() is called. Sometimes it returns 0. But in another thread when recv() is called on the same socket, data is received as expected.
What does this mean? I also noticed that calling Application->ProcessMessage() inside the DLL thread speeds up data receiving.
But what is wrong? Doesn't ProcessMessages() just process window messages or am I missing something?
Thank you

If I understood you correctly and you are trying to recv on the same SOCKET in parallel threads then don't do that, there is nothing to gain from it. The data you are recv is already buffered by the underlying system and you are accessing that, the thing you could do is to make multiple buffers for the recv so that when it returns data you could pass one buffer to the "upper levels" for processing and use the other one for the new recv call. You can also use just one large buffer with notifications what is for processing and what part is being used for receiving. The system probably has locks that forbid multiple reading on the same socket and so the result in one recv is 0. If it didn't have that you would probably end up with some almost randomly split data.
EDIT: Full and long explanation
I think that using multiple threads to read from a single socket is not useful
Sockets are a software regulated thing. You network device doesn't create any "connections", it just processes the data received and wraps/unwrapps them into IP (or any other
supported Internet Layer) packets (the previous depending on the network device, some of them are almost entirely software emulated by the os and actually perform just the basic "write to tx-read rx" services but to us its the same deal) . The WinSock2 service recognizes packets with specific data ( as you have already noticed ) so that you may use one network device for simultaneously
communicating with multiple peers. WinSock2 activly monitors the traffic before handing it out to you. In other words: when you are about to get a successfull recv the data
was already there and the underlying system has checked the socket you used as a parameter in recv and only handed you over the data that has already been marked as the data
for that socket. Reading with multiple threads from one socket (without the almost useless MSG_PEEK) would make the system, if it didn't have locks, copy unknown number of bytes
to the location supplied in recv in the thread one and increment the internal pointer to data by number of copied bytes permanently, then, before whole data availible in the
recv is copied at the location1, the other thread would kick in and copy also unknown number of bytes thus also incrementing the internal pointer to data by that many bytes.
Result of this type of reading would ideally be half of the data stored from location supplied in thread 1, the other half starting from location supplied in thread 2. Since the ideal result is uncertain (time allocated by the system for theese two threads is not guarantied to be equal) you would end up with unsorted data without any means of sorting
it, since the info that the underlying system uses for knowing what data belongs to which socket will not be able to you.
Being that your system is most likely faster than your network device I stand by my two solutions, first one prefered as I have been using this method for both big and small chunks of data transfer:
Make one reading thread per connected socket and one circular buffer, size of the buffer depends on the size of chunks you expect to receive and the time you will need to process the stuff further, save current read position, save "to process count", when data is received notify the thread/threads that it is supposed to process the data in the buffer, save the position of the data being used for reading, continue recv if there is buffer space not being processed else wait until there is (must implement this in case your computer chokes somewhere, in normal situations it shouldn't). You must sync the receiving thread with the processing thread/threads when they are accesing the "to_process_count" and "current read pos" vars as those will tell
you which bytes you can reuse in your circular buffer.
Create and connect one socket per desired reading thread so that the system will know how to regulate the data on its own
The thing you are refering too as random threads reading from a single socket, is maybe acievable through the following scenarios:
1 Thread Enumerates socket to see if there is data availible
when data is availible it uses some mutex to wait if some thread is already in the reading state starts a new thread to read and process the existing data
or it can be achieved with something like this
Thread does its recv as soon as it has done a successful recv (yey, the data is in the buffer) it starts another thread from some thread pool to do recv and continues to process data and end itself
Theese are the only ways I can imagine that "reading with multiple threads on a single socket" is achievable. Yes, there won't be multiple threads calling recv at the same time
Sorry for the long post, the spelling and grammar errors and hope this helps you a bit

Ensure that socket is properly bound to the handle you are using in recv function.
You cannot speedup data reception, unless there is channel to receive the data.

Help needed for implementing a thread Monitoring Mechanism

I am working on a multithreaded middleware enviornment. The framework is basically a capturing and streaming framework. So it involves a number of threads.
To give you all a brief idea of the threading architecture:
There are seprate threads for demultiplexer, receiveVideo, DecodeVideo, DisplayVideo etc. Each thread performs its functionlity, for eg:
demultiplexer extracts audio, video packets
receivevideo receives header + payload of video packet & removes payload
DecodeVideo receives payload & decodes payload packet
DisplayVideo receives decoded packets & displays the decoded packets on display
Thus each thread feeds the extracted data to the next thread. The threads share data buffers amongst them and the buffers are synchronised through use of mutexes and semaphores. Similarly, there are other threads for handling ananlogvideo and analogaudio etc.
All the threads are spawned in during initialization but they remain blocked on a semaphore and depending upon the input(analog/digitial) selective semaphores are signalled so that specifc threads get unblocked & move on to do their work. At various stages each thread calls some lower level(driver calls)to get data or write data etc. These calls are blocking and the errors resulting from these calls(driver returning corrupted data, driver stalling) should be handled but are not being handled currently.
I wanted to implement a thread monitoring mechanism where a thread will monitor these worker threads and if an error condition occurs will take some preventive actions. As I understand certain such mechanisms are commonly used like Watchdogs in UI or MMI applications. I am trying to look for something similar.
I am using pthreads and No Boost or STL(its a legacy code, pretty much procedural C++)
Any ideas about specific framework or design patterns or open source projects which do something similar and might help in with ideas for implementing my requirement?

Can you ping the threads - periodically send each one a message on its usual input queue, interleaved with all the other normal stuff, asking it to return its status? When each handler thread gets the message, it loads the message with status stuff - how many messages its processed since the last ping, length of its input/output queue, last time that its driver returned OK, that sort of stats - and queues it back to your Thread Monitoring Mechanism. Your TMM would have to time out the replies in case some thread/s is/are stuck.
You could, maybe, just post one message down the whole chain, each thread adding its own status in different fields. That would mean only one timeout, after which your TMM would have to examine the message to see how far down the chain it got.
There are other things - I like to keep an on-screen dump, on a 1s timer, of the length of queues and depth of buffer pools. If something stuffs, I can usually tell roughly where it is, (eg. a pool is emptying and some queue is growing - the queue comsumer is wasted).
Rgds,
Martin

What about using a signalling system to wake up your monitoring thread when something's gone awry in one of your worker threads. You can emulate the signalling with an ResetEvent of some type.
When an exception occurs in your worker thread, you have some data structure you fill up with the data about the exception and then you can pass that on to your monitoring thread. You wake up the monitoring thread by using the event.
Then the monitoring thread can do what you need it to do.
I'm guessing you don't wish to have your monitoring thread active unless something has gone wrong, right?

How can I slow down a TCP connection on Windows?

I am developing a Windows proxy program where two TCP sockets, connected through different adapters are bridged by my program. That is, my program reads from one socket and writes to the other, and vice versa. Each socket is handled by its own thread. When one socket reads data it is queued for the other socket to write it. The problem I have is the case when one link runs at 100Mb and the other runs at 10Mb. I read data from the 100Mb link faster than I can write it to the 10Mb link. How can I "slow down" the faster connection so that it is essentially running at the slower link speed? Changing the faster link to a slower speed is not an option. --Thanks

Create a fixed length queue between reading and writing threads. Block on the enqueue when queue is full and on dequeue when it's empty. Regular semaphore or mutex/condition variable should work. Play with the queue size so the slower thread is always busy.

If this is a problem, then you're writing your program incorrectly.
You can't put more than 10mbps on a 10mbps link, so your thread that is writing on the slower link should start to block as you write. So as long as your thread uses the same size read buffer as write buffer, the thread should only consume data as quickly as it can throw it back out the 10mbps pipe. Any flow control needed to keep the remote sender from putting more than 10mbps into the 100mbps pipe to you will be taken care of automatically by the TCP protocol.
So it just shouldn't be an issue as long as your read and write buffers are the same size in that thread (or any thread).

Stop reading the data when you are not able to write it.
There is a queue of bytes coming into your program from the 100Mb/s link, and a queue out of your program to the 10Mb/s link. When the outgoing queue is full, stop reading from the incoming queue and TCP with throttle back the client on the 100Mb/s link.
You can use an internal queue between the reader and the writer to implement this cleanly.

A lot of complicated - and correct - solutions have been expounded. But really, to get to the crux of the matter - why do you have two threads? If you did the socket-100 read, socket-10 write in a single thread, it would naturally block on the write and you wouldn't have to design anything complicated.

If you are doing a non-blocking, select()-style event loop: only call FD_SET(readSocket, &readSet) if your outgoing-data queue is smaller than some hard-coded maximum size.
That way, when the outgoing socket falls behind, your proxy will stop reading data from the faster client until it catches back up. The TCP protocol will take care of the rest (in particular, it will tell your faster client to slow down for a while)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js