Use Post or PostAndAsyncReply with F#'s MailboxProcessor?

Use Post or PostAndAsyncReply with F#'s MailboxProcessor? - concurrency

I've seen different snippets demonstrating a Put message that returns unit with F#'s MailboxProcessor. In some, only the Post method is used while others use PostAndAsyncReply, with the reply channel immediately replying once the message is being processed. In doing some testing, I found a significant time lag when awaiting the reply, so it seems that unless you need a real reply, you should use Post.
Note: I started asking this in another thread but thought it useful to post as a full question. In the other thread, Tomas Petricek mentioned that the reply channel could be used a wait mechanism to ensure the caller delayed until the Put message was processed.
Does using PostAndAsyncReply help with message ordering, or is it just to force a pause until the first message is processed? In terms of performance Post appears the right solution. Is that accurate?
Update:
I just thought of a reason why PostAndAsyncReply might be necessary in the BlockingQueueAgent example: Scan is used to find Get messages when the queue is full, so you don't want to Put and then Get before the previous Put has completed.

I think I generally agree with your summary - it makes sense that PostAndAsyncReply is slower than Post, so if the caller doesn't need to get a notification from the agent when the operation (such as putting value into the queue) completes, it should definitely expose a way to do that using just Post. The fact that PostAndAsyncReply is a lot slower probably means that some agents should expose both options and let the caller decide.
Regarding the specific example of BlockingQueueAgent (or a similar one that I used to implement one-place buffer), the typical application of the agent is to solve the consumer-producer problem. In consumer-producer problem, we want to block the producer when the queue is full and block the consumer when it is empty. The .NET BlockingCollection supports only synchronous blocking, which is a bit bad (i.e. it can block the whole thread pool).
The using the BlockingQueueAgent that sends the Put messsage using PostAndAsyncReply, we can wait until the element is added to the queue asynchronously (so it blocks the producer, but without blocking threads!) An example of typical usage is the image processing pipeline that I wrote some time ago. Here is one snippet from that:
// Phase 2: Scale to a thumbnail size and add frame
let scalePipelinedImages = async {
while true do
let! info = loadedImages.AsyncGet()
scaleImage info
do! scaledImages.AsyncAdd(info) }
This loop repeatedly gets an image from the loadedImages queue, does some processing and writes the result to scaledImages. The blocking using the queue (both when reading and when writing) controls the parallelism, so that the steps of pipeline run in parallel, but it does not keep loading more and more images if the pipeline cannot handle them at the required speed.

My advice is to design your system so you can use Post as much as possible.
This technology was designed for asynchronous concurrency where the objective is to fire-and-forget messages. The idea of waiting for a response goes directly against the grain of this.

Related

How to implement long running gRPC async streaming data updates in C++ server

I'm creating an async gRPC server in C++. One of the methods streams data from the server to clients - it's used to send data updates to clients. The frequency of the data updates isn't predictable. They could be nearly continuous or as infrequent as once per hour. The model used in the gRPC example with the "CallData" class and the CREATE/PROCESS/FINISH states doesn't seem like it would work very well for that. I've seen an example that shows how to create a 'polling' loop that sleeps for some time and then wakes up to check for new data, but that doesn't seem very efficient.
Is there another way to do this? If I use the "CallData" method can it block in the 'PROCESS' state until there's data (which probably wouldn't be my first choice)? Or better, can I structure my code so I can notify a gRPC handler when data is available?
Any ideas or examples would be appreciated.

In a server-side streaming example, you probably need more states, because you need to track whether there is currently a write already in progress. I would add two states, one called WRITE_PENDING that is used when a write is in progress, and another called WRITABLE that is used when a new message can be sent immediately. When a new message is produced, if you are in state WRITABLE, you can send immediately and go into state WRITE_PENDING, but if you are in state WRITE_PENDING, then the newly produced message needs to go into a queue to be sent after the current write finishes. When a write finishes, if the queue is non-empty, you can grab the next message from the queue and immediately start a write for it; otherwise, you can just go into state WRITABLE and wait for another message to be produced.
There should be no need to block here, and you probably don't want to do that anyway, because it would tie up a thread that should otherwise be polling the completion queue. If all of your threads wind up blocked that way, you will be blind to new events (such as new calls coming in).
An alternative here would be to use the C++ sync API, which is much easier to use. In that case, you can simply write straight-line blocking code. But the cost is that it creates one thread on the server for each in-progress call, so it may not be feasible, depending on the amount of traffic you're handling.
I hope this information is helpful!

What happens to running processes on a continuous Azure WebJob when website is redeployed?

I've read about graceful shutdowns here using the WEBJOBS_SHUTDOWN_FILE and here using Cancellation Tokens, so I understand the premise of graceful shutdowns, however I'm not sure how they will affect WebJobs that are in the middle of processing a queue message.
So here's the scenario:
I have a WebJob with functions listening to queues.
Message is added to Queue and job begins processing.
While processing, someone pushes to develop, triggering a redeploy.
Assuming I have my WebJobs hooked up to deploy on git pushes, this deploy will also trigger the WebJobs to be updated, which (as far as I understand) will kick off some sort of shutdown workflow in the jobs. So I have a few questions stemming from that.
Will jobs in the middle of processing a queue message finish processing the message before the job quits? Or is any shutdown notification essentially treated as "this bitch is about to shutdown. If you don't have anything to handle it, you're SOL."
If we are SOL, is our best option for handling shutdowns essentially to wrap anything you're doing in the equivalent of DB transactions and implement your shutdown handler in such a way that all changes are rolled back on shutdown?
If a queue message is in the middle of being processed and the WebJob shuts down, will that message be requeued? If not, does that mean that my shutdown handler needs to handle requeuing that message?
Is it possible for functions listening to queues to grab any more queue messages after the Job has been notified that it needs to shutdown?
Any guidance here is greatly appreciated! Also, if anyone has any other useful links on how to handle job shutdowns besides the ones I mentioned, it would be great if you could share those.

After no small amount of testing, I think I've found the answers to my questions and I hope someone else can gain some insight from my experience.
NOTE: All of these scenarios were tested using .NET Console Apps and Azure queues, so I'm not sure how blobs or table storage, or different types of Job file types, would handle these different scenarios.
After a Job has been marked to exit, the triggered functions that are running will have the configured amount of time (grace period) (5 seconds by default, but I think that is configurable by using a settings.job file) to finish before they are exited. If they do not finish in the grace period, the function quits. Main() (or whichever file you declared host.RunAndBlock() in), however, will finish running any code after host.RunAndBlock() for up to the amount of time remaining in the grace period (I'm not sure how that would work if you used an infinite loop instead of RunAndBlock). As far as handling the quit in your functions, you can essentially "listen" to the CancellationToken that you can pass in to your triggered functions for IsCancellationRequired and then handle it accordingly. Also, you are not SOL if you don't handle the quits yourself. Huzzah! See point #3.
While you are not SOL if you don't handle the quit (see point #3), I do think it is a good idea to wrap all of your jobs in transactions that you won't commit until you're absolutely sure the job has ran its course. This way if your function exits mid-process, you'll be less likely to have to worry about corrupted data. I can think of a couple scenarios where you might want to commit transactions as they pass (batch jobs, for instance), however you would need to structure your data or logic so that previously processed entities aren't reprocessed after the job restarts.
You are not in trouble if you don't handle job quits yourself. My understanding of what's going on under the covers is virtually non-existent, however I am quite sure of the results. If a function is in the middle of processing a queue message and is forced to quit before it can finish, HAVE NO FEAR! When the job grabs the message to process, it will essentially hide it on the queue for a certain amount of time. If your function quits while processing the message, that message will "become visible" again after x amount of time, and it will be re-grabbed and ran against the potentially updated code that was just deployed.
So I have about 90% confidence in my findings for #4. And I say that because to attempt to test it involved quick-switching between windows while not actually being totally sure what was going on with certain pieces. But here's what I found: on the off chance that a queue has a new message added to it in the grace period b4 a job quits, I THINK one of two things can happen: If the function doesn't poll that queue before the job quits, then the message will stay on the queue and it will be grabbed when the job restarts. However if the function DOES grab the message, it will be treated the same as any other message that was interrupted: it will "become visible" on the queue again and be reran upon the restart of the job.
That pretty much sums it up. I hope other people will find this useful. Let me know if you want any of this expounded on and I'll be happy to try. Or if I'm full of it and you have lots of corrections, those are probably more welcome!

Help needed for implementing a thread Monitoring Mechanism

I am working on a multithreaded middleware enviornment. The framework is basically a capturing and streaming framework. So it involves a number of threads.
To give you all a brief idea of the threading architecture:
There are seprate threads for demultiplexer, receiveVideo, DecodeVideo, DisplayVideo etc. Each thread performs its functionlity, for eg:
demultiplexer extracts audio, video packets
receivevideo receives header + payload of video packet & removes payload
DecodeVideo receives payload & decodes payload packet
DisplayVideo receives decoded packets & displays the decoded packets on display
Thus each thread feeds the extracted data to the next thread. The threads share data buffers amongst them and the buffers are synchronised through use of mutexes and semaphores. Similarly, there are other threads for handling ananlogvideo and analogaudio etc.
All the threads are spawned in during initialization but they remain blocked on a semaphore and depending upon the input(analog/digitial) selective semaphores are signalled so that specifc threads get unblocked & move on to do their work. At various stages each thread calls some lower level(driver calls)to get data or write data etc. These calls are blocking and the errors resulting from these calls(driver returning corrupted data, driver stalling) should be handled but are not being handled currently.
I wanted to implement a thread monitoring mechanism where a thread will monitor these worker threads and if an error condition occurs will take some preventive actions. As I understand certain such mechanisms are commonly used like Watchdogs in UI or MMI applications. I am trying to look for something similar.
I am using pthreads and No Boost or STL(its a legacy code, pretty much procedural C++)
Any ideas about specific framework or design patterns or open source projects which do something similar and might help in with ideas for implementing my requirement?

Can you ping the threads - periodically send each one a message on its usual input queue, interleaved with all the other normal stuff, asking it to return its status? When each handler thread gets the message, it loads the message with status stuff - how many messages its processed since the last ping, length of its input/output queue, last time that its driver returned OK, that sort of stats - and queues it back to your Thread Monitoring Mechanism. Your TMM would have to time out the replies in case some thread/s is/are stuck.
You could, maybe, just post one message down the whole chain, each thread adding its own status in different fields. That would mean only one timeout, after which your TMM would have to examine the message to see how far down the chain it got.
There are other things - I like to keep an on-screen dump, on a 1s timer, of the length of queues and depth of buffer pools. If something stuffs, I can usually tell roughly where it is, (eg. a pool is emptying and some queue is growing - the queue comsumer is wasted).
Rgds,
Martin

What about using a signalling system to wake up your monitoring thread when something's gone awry in one of your worker threads. You can emulate the signalling with an ResetEvent of some type.
When an exception occurs in your worker thread, you have some data structure you fill up with the data about the exception and then you can pass that on to your monitoring thread. You wake up the monitoring thread by using the event.
Then the monitoring thread can do what you need it to do.
I'm guessing you don't wish to have your monitoring thread active unless something has gone wrong, right?

How to design a state machine in face of non-blocking I/O?

I'm using Qt framework which has by default non-blocking I/O to develop an application navigating through several web pages (online stores) and carrying out different actions on these pages. I'm "mapping" specific web page to a state machine which I use to navigate through this page.
This state machine has these transitions;
Connect, LogIn, Query, LogOut, Disconnect
and these states;
Start, Connecting, Connected, LoggingIn, LoggedIn, Querying, QueryDone, LoggingOut, LoggedOut, Disconnecting, Disconnected
Transitions from *ing to *ed states (Connecting->Connected), are due to LoadFinished asynchronous network events received from network object when currently requested url is loaded. Transitions from *ed to *ing states (Connected->LoggingIn) are due to events send by me.
I want to be able to send several events (commands) to this machine (like Connect, LogIn, Query("productA"), Query("productB"), LogOut, LogIn, Query("productC"), LogOut, Disconnect) at once and have it process them. I don't want to block waiting for the machine to finish processing all events I sent to it. The problem is they have to be interleaved with the above mentioned network events informing machine about the url being downloaded. Without interleaving machine can't advance its state (and process my events) because advancing from *ing to *ed occurs only after receiving network type of event.
How can I achieve my design goal?
EDIT
The state machine I'm using has its own event loop and events are not queued in it so could be missed by machine if they come when the machine is busy.
Network I/O events are not posted directly to neither the state machine nor the event queue I'm using. They are posted to my code (handler) and I have to handle them. I can forward them as I wish but please have in mind remark no. 1.
Take a look at my answer to this question where I described my current design in details. The question is if and how can I improve this design by making it
More robust
Simpler

Sounds like you want the state machine to have an event queue. Queue up the events, start processing the first one, and when that completes pull the next event off the queue and start on that. So instead of the state machine being driven by the client code directly, it's driven by the queue.
This means that any logic which involves using the result of one transition in the next one has to be in the machine. For example, if the "login complete" page tells you where to go next. If that's not possible, then the event could perhaps include a callback which the machine can call, to return whatever it needs to know.

Asking this question I already had a working design which I didn't want to write about not to skew answers in any direction :) I'm going to describe in this pseudo answer what the design I have is.
In addition to the state machine I have a queue of events. Instead of posting events directly to the machine I'm placing them in the queue. There is however problem with network events which are asynchronous and come in any moment. If the queue is not empty and a network event comes I can't place it in the queue because the machine will be stuck waiting for it before processing events already in the queue. And the machine will wait forever because this network event is waiting behind all events placed in the queue earlier.
To overcome this problem I have two types of messages; normal and priority ones. Normal ones are those send by me and priority ones are all network ones. When I get network event I don't place it in the queue but instead I send it directly to the machine. This way it can finish its current task and progress to the next state before pulling the next event from the queue of events.
It works designed this way only because there is exactly 1:1 interleave of my events and network events. Because of this when the machine is waiting for a network event it's not busy doing anything (so it's ready to accept it and does not miss it) and vice versa - when the machine waits for my task it's only waiting for my task and not another network one.
I asked this question in hope for some more simple design than what I have now.

Strictly speaking, you can't. Because you only have state "Connecting", you don't know whether you need top login afterwards. You'd have to introduce a state "ConnectingWithIntentToLogin" to represent the result of a "Connect, then Login" event from the Start state.
Naturally there will be a lot of overlap between the "Connecting" and the "ConnectingWithIntentToLogin" states. This is most easily achieved by a state machine architecture that supports state hierarchies.
--- edit ---
Reading your later reactions, it's now clear what your actual problem is.
You do need extra state, obviously, whether that's ingrained in the FSM or outside it in a separate queue. Let's follow the model you prefer, with extra events in a queue. The rick here is that you're wondering how to "interleave" those queued events vis-a-vis the realtime events. You don't - events from the queue are actively extracted when entering specific states. In your case, those would be the "*ed" states like "Connected". Only when the queue is empty would you stay in the "Connected" state.

If you don't want to block, that means you don't care about the network replies. If on the other hand the replies interest you, you have to block waiting for them. Trying to design your FSM otherwise will quickly lead to your automaton's size reaching infinity.

How about moving the state machine to a different thread, i. e. QThread. I would implent a input queue in the state machine so I could send queries non blocking and a output queue to read the results of the queries. You could even call back a slotted function in your main thread via connect(...) if a result of a query arrives, Qt is thread safe in this regard.
This way your state machine could block as long as it needs without blocking your main program.

Sounds like you just want to do a list of blocking I/O in the background.
So have a thread execute:
while( !commands.empty() )
{
command = command.pop_back();
switch( command )
{
Connect:
DoBlockingConnect();
break;
...
}
}
NotifySenderDone();

Network Multithreading

I'm programming an online game for two reasons, one to familiarize myself with server/client requests in a realtime environment (as opposed to something like a typical web browser, which is not realtime) and to actually get my hands wet in that area, so I can proceed to actually properly design one.
Anywho, I'm doing this in C++, and I've been using winsock to handle my basic, basic network tests. I obviously want to use a framelimiter and have 3D going and all of that at some point, and my main issue is that when I do a send() or receive(), the program kindly idles there and waits for a response. That would lead to maybe 8 fps on even the best internet connection.
So the obvious solution to me is to take the networking code out of the main process and start it up in its own thread. Ideally, I would call a "send" in my main process which would pass the networking thread a pointer to the message, and then periodically (every frame) check to see if the networking thread had received the reply, or timed out, or what have you. In a perfect world, I would actually have 2 or more networking threads running simultaneously, so that I could say run a chat window and do a background download of a piece of armor and still allow the player to run around all at once.
The bulk of my problem is that this is a new thing to me. I understand the concept of threading, but I can see some serious issues, like what happens if two threads try to read/write the same memory address at the same time, etc. I know that there are already methods in place to handle this sort of thing, so I'm looking for suggestions on the best way to implement something like this. Basically, I need thread A to be able to start a process in thread B by sending a chunk of data, poll thread B's status, and then receive the reply, also as a chunk of data., ideally without any major crashing going on. ^_^ I'll worry about what that data actually contains and how to handle dropped packets, etc later, I just need to get that happening first.
Thanks for any help/advice.
PS: Just thought about this, may make the question simpler. Is there a way to use the windows event handling system to my advantage? Like, would it be possible to have thread A initialize data somewhere, then trigger an event in thread B to have it pick up the data, and vice versa for thread B to tell thread A it was done? That would probably solve a lot of my problems, since I don't really need both threads to be able to work on the data at the same time, more of a baton pass really. I just don't know if this is possible between two different threads. (I know one thread can create its own messages for the event handler.)

The easiest thing
for you to do, would be to simply invoke the windows API QueueUserWorkItem. All you have to specify is the function that the thread will execute and the input passed to it. A thread pool will be automatically created for you and the jobs executed in it. New threads will be created as and when is required.
http://msdn.microsoft.com/en-us/library/ms684957(VS.85).aspx
More Control
You could have a more detailed control using another set of API's which can again manage the thread pool for you -
http://msdn.microsoft.com/en-us/library/ms686980(VS.85).aspx
Do it yourself
If you want to control all aspects of your thread creation and the pool management you would have to create the threads yourself, decide how they should end , how many to create etc (beginthreadex is the api you should be using to create threads. If you use MFC you should use AfxBeginThread function).
Send jobs to worker threads - Io completion Ports
In this case, you would also have to worry about how to communicate your jobs - i would recommend IoCOmpletionPorts to do that. It is the most scalable notification mechanism that i currently know of made for this purpose. It has the additional advantage that it is implemented in the kernel so you avoid all kinds of dead loack sitautions you would encounter if you decide to handroll something yourself.
This article will show you how with code samples -
http://blogs.msdn.com/larryosterman/archive/2004/03/29/101329.aspx
Communicate Back - Windows Messages
You could use windows messages to communicate the status back to your parent thread since it is doing the message wait anyway. use the PostMessage function to do this. (and check for errors)
ps : You could also allocate the data that needs to be sent out on a dedicated pointer and then the worker thread could take care of deleting it after sending it out. That way you avoid the return pointer traffic too.

BlodBath's suggestion of non-blocking sockets is potentially the right approach.
If you're trying to avoid using a multithreaded approach, then you could investigate the use of setting up overlapped I/O on your sockets. They will not block when you do a transmit or receive, but have the added bonus of giving you the option of waiting for multiple events within your single event loop. When your transmit has finished, you will receive an event. (see this for some details)
This is not incompatible with a multithreaded approach, so there's the option of changing your mind later. ;-)
On the design of your multithreaded app. the best thing to do is to work out all of the external activities that you want to be alerted to. For example, so far in your question you've listed network transmits, network receives, and user activity.
Depending on the number of concurrent connections you're going to be dealing with you'll probably find it conceptually simpler to have a thread per socket (assuming small numbers of sockets), where each thread is responsible for all of the processing for that socket.
Then you can implement some form of messaging system between your threads as RC suggested.
Arrange your system so that when a message is sent to a particular thread and event is also sent. Your threads can then be sent to sleep waiting for one of those events. (as well as any other stimulus - like socket events, user events etc.)
You're quite right that you need to be careful of situations where more than one thread is trying to access the same piece of memory. Mutexes and semaphores are the things to use there.
Also be aware of the limitations that your gui has when it comes to multithreading.
Some discussion on the subject can be found in this question.
But the abbreviated version is that most (and Windows is one of these) GUIs don't allow multiple threads to perform GUI operations simultaneously. To get around this problem you can make use of the message pump in your application, by sending custom messages to your gui thread to get it to perform gui operations.

I suggest looking into non-blocking sockets for the quick fix. Using non-blocking sockets send() and recv() do not block, and using the select() function you can get any waiting data every frame.

See it as a producer-consumer problem: when receiving, your network communication thread is the producer whereas the UI thread is the consumer. When sending, it's just the opposite. Implement a simple buffer class which gives you methods like push and pop (pop should be blocking for the network thread and non-blocking for the UI thread).
Rather than using the Windows event system, I would prefer something that is more portable, for example Boost condition variables.

I don't code games, but I've used a system similar to what pukku suggested. It lends nicely to doing things like having the buffer prioritize your messages to be processed if you have such a need.
I think of them as mailboxes per thread. You want to send a packet? Have the ProcessThread create a "thread message" with the payload to go on the wire and "send" it to the NetworkThread (i.e. push it on the NetworkThread's queue/mailbox and signal the condition variable of the NetworkThread so he'll wake up and pull it off). When the NetworkThread receives the response, package it up in a thread message and send it back to the ProcessThread in the same manner. Difference is the ProcessThread won't be blocked on a condition variable, just polling on mailbox.empty( ) when you want to check for the response.
You may want to push and pop directly, but a more convenient way for larger projects is to implement a toThreadName, fromThreadName scheme in a ThreadMsg base class, and a Post Office that threads register their Mailbox with. The PostOffice then has a send(ThreadMsg*); function that gets/pushes the messages to the appropriate Mailbox based on the to and from. Mailbox (the buffer/queue class) contains the ThreadMsg* = receiveMessage(), basically popping it off the underlying queue.
Depending on your needs, you could have ThreadMsg contain a virtual function process(..) that could be overridden accordingly in derived classes, or just have an ordinary ThreadMessage class with a to, from members and a getPayload( ) function to get back the raw data and deal with it directly in the ProcessThread.
Hope this helps.

Some topics you might be interested in:
mutex: A mutex allows you to lock access to specific resources for one thread only
semaphore: A way to determine how many users a certain resource still has (=how many threads are accessing it) and a way for threads to access a resource. A mutex is a special case of a semaphore.
critical section: a mutex-protected piece of code (street with only one lane) that can only be travelled by one thread at a time.
message queue: a way of distributing messages in a centralized queue
inter-process communication (IPC) - a way of threads and processes to communicate with each other through named pipes, shared memory and many other ways (it's more of a concept than a special technique)
All topics in bold print can be easily looked up on a search engine.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js