Guarantee order of messages posted to mailbox processor - concurrency

I have a mailbox processor which receives a fixed number of messages:
let consumeThreeMessages = MailboxProcessor.Start(fun inbox ->
async {
let! msg1 = inbox.Receive()
printfn "msg1: %s" msg1
let! msg2 = inbox.Receive()
printfn "msg2: %s" msg2
let! msg3 = inbox.Receive()
printfn "msg3: %s" msg3
}
)
consumeThreeMessages.Post("First message")
consumeThreeMessages.Post("Second message")
consumeThreeMessages.Post("Third message")
These messages should be handled in exactly the order sent. During my testing, it prints out exactly what it should:
First message
Second message
Third message
However, since message posting is asynchronous, it sounds like posting 3 messages rapidly could result in items being processed in any order. For example, I do not want to receive messages out of order and get something like this:
Second message // <-- oh noes!
First message
Third message
Are messages guaranteed to be received and processed in the order sent? Or is it possible for messages to be received or processed out of order?

The code in your consumeThreeMessages function will always execute in order, because of the way F#'s async workflows work.
The following code:
async {
let! msg1 = inbox.Receive()
printfn "msg1: %s" msg1
let! msg2 = inbox.Receive()
printfn "msg2: %s" msg2
}
Roughly translates to:
async.Bind(
inbox.Receive(),
(fun msg1 ->
printfn "msg1: %s" msg1
async.Bind(
inbox.Receive(),
(fun msg2 -> printfn "msg2: %s" msg2)
)
)
)
When you look at the desugared form, it is clear that the code executes in serial. The 'async' part comes into play in the implementation of async.Bind, which will start the computation asynchronously and 'wake up' when it completes to finish the execution. This way you can take advantage of asynchronous hardware operations, and not waste time on OS threads waiting for IO operations.
That doesn't mean that you can't run into concurrency issues when using F#'s async workflows however. Imagine that you did the following:
let total = ref 0
let doTaskAsync() =
async {
for i = 0 to 1000 do
incr total
} |> Async.Start()
// Start the task twice
doTaskAsync()
doTaskAsync()
The above code will have two asynchronous workflows modifying the same state at the same time.
So, to answer your question in brief: within the body of a single async block things will always execute in order. (That is, the next line after a let! or do! doesn't execute until the async operation completes.) However, if you share state between two async tasks, then all bets are off. In that case you will need to consider locking or using Concurrent Data Structures that come with CLR 4.0.

Related

Is there way to execute any particular flow after every other flow without need to plug it explicitly

I have multiple flows(To process message received from queue) to execute and after every flow I need to check if there is any error in previous flow, if yes, then I filter out the message in process, otherwise continue to next flow.
Currently, I have to plug this error handler flow explicitly after every other flow. Is there any way this can be done with some functionality where this error flow can be configured to run after every other flow. Or any other better way to do this?
Example:
flow 1 -> Validate message, if error, mark message as error
error flow -> check if message is marked error, if yes filter, otherwise continue.
flow 2 -> persist message to db, mark in case of error.
error flow -> check if message is marked error, if yes filter, otherwise continue
flow 3 -> and so on.
Or is there way to wrap (flow 1 + error flow), (flow 2 -> error flow) ?
I am not sure it is exactly what you asked for, but I have sort of a solution. What can be done, is creating all flows, for instance we can look at:
val flows = Seq (
Flow.fromFunction[Int, Int](x => { println(s"flow1: Received $x"); x * 2 }),
Flow.fromFunction[Int, Int](x => { println(s"flow2: Received $x"); x + 1}),
Flow.fromFunction[Int, Int](x => { println(s"flow3: Received $x"); x * x})
)
Then, we need to append to each of the exsiting flows, the error handling. So let's define it, and add it to each of the elements:
val errorHandling = Flow[Int].filter(_ % 2 == 0)
val errorsHandledFlows = flows.map(flow => flow.via(errorHandling))
Now, we need a helper function, that will connect all of our new flows:
def connectFlows(errorsHandledFlows: Seq[Flow[Int, Int, _]]): Flow[Int, Int, _] = {
errorsHandledFlows match {
case Seq() => Flow[Int] // This line is actually redundant, But I don't want to have an unexhausted pattern matching
case Seq(singleFlow) => singleFlow
case head :: tail => head.via(connectFlows(tail))
}
}
And now, we need to execute all together, for example:
Source(1 to 4).via(connectFlows(errorsHandledFlows)).to(Sink.foreach(println)).run()
Will provide the output:
flow1: Received 1
flow2: Received 2
flow1: Received 2
flow2: Received 4
flow1: Received 3
flow2: Received 6
flow1: Received 4
flow2: Received 8
As you can tell, we filter the odd numbers. Therefore the first flow gets all numbers from 1 to 4. The second flow received 2,4,6,8 (the first flow multiplied the values by 2), and the last one did not receive any flow, because the second flow makes all of the values odd.
You can also use Merge
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val merge = builder.add(Merge[Int](3))
val flow1 = ...
val flow2 = ...
val flow3 = ...
flow1 ~> merge
flow2 ~> merge
flow3 ~> merge
ClosedShape
})
Not sure if it meets your need, just showing the alternative.

OCaml - Parmap executing Lwt threads hangs on the execution

This is a follow up to this question:
How to synchronously execute an Lwt thread
I am trying to run the following piece of code:
open Lwt
open Cohttp_lwt_unix
let server_content2 x =
"in server content x" |> print_endline ;
Client.get (Uri.of_string ("http://localhost:8080/"^x)) >>= fun (_, body) ->
(Cohttp_lwt.Body.to_string body) >|= fun sc -> sc
;;
let reyolo () =
List.init 10 (fun i -> server_content2 (string_of_int i) ) ;;
let par () =
let yolo = reyolo () in
"in par" |> print_endline;
Parmap.pariter
~ncores:4
(fun p -> "before run" |> print_endline ; "content:"^(Lwt_main.run p) |> print_endline ; "after run" |> print_endline )
(Parmap.L yolo);;
par ()
I expected this to perform 10 remote connections.
What I get is in par function Lwt_main.run seems to stuck before doing an actual remote call.
I doubt it might be of any significance but the server that suppose to respond is made in python and looks like this:
import subprocess
from bottle import run, post, request, response, get, route
#route('/<path>',method = 'GET')
def process(path):
print(path)
return "yolo"
run(host='localhost', port=8080, debug=True)
The issue is that the calls to server_content2, which start the requests, occur in the parent process. The code then tries to finish them in the child processes spawned by Parmap. Lwt breaks here: it cannot, in general, keep track of I/Os across a fork.
If you store either thunks or arguments in the list yolo, and delay the calls to server_content2 so that they are done in the child processes, the requests should work. To do that, make sure the calls happen in the callback of Parmap.pariter.

Synchronized, wait and notify in Akka

I'd like to ask you a thing. I have one "Rec" actor and more "Sen actors". The first one has a list of messages that must be forwarded and the senders are actors that keep sending messages to Receiver.
Something like this:
class Rec (frw: Actor) extends Actor
{
val myList = Nil;
def act =
{
case "Start" => while(true) {/*Extract the first number and send it to frw*/} //I must use a new actor if I want to keep reading messages
case i => myList+=i; //Should i use a "synchronize" here (i'm extracting elements with the other actor after all)?
}
}
class Sen (rcv: Actor) extends Actor
{
var i=100;
def act =
{
case "Start" => while(i>0) {i+=1; rcv ! i;}
}
}
My problem is that Rec must forward just one message at the time, so it must save all the messages in a list. Is there a way to improve the Rec actor? I don't like the while(true), but i can't extract the first number whenever i receive a message (messages could stay in too long in the list). Should i use synchronized/wait/notify or is there something better for Akka?
Thank you and sorry for my bad english.
According to your code you do not need the Rec actor at all. You will achieve the same result if you send messages from Sen to frw directly.
You have to rememeber that Akka actors process messages sequentially one after another. So all your messages from Sen will arrive to frw one message at the time even if you send them directly from Sen to frw.

How to communicate between Agents?

Using the MailboxProcessor in F#, what is the preferred way to communicate between them? - Wrapping the agents into objects like:
type ProcessAgent(saveAgent:SaveAgent) = ...
type SaveAgent() = ...
let saveAgent = new SaveAgent()
let processAgent = new ProcessAgent(mySaveAgent)
or what about:
type ProcessAgent(cont:string -> unit) = ...
type SaveAgent() = ...
let saveAgent = new SaveAgent()
let processAgent = new ProcessAgent(fun z -> saveAgent.Add z)
or maybe even something like:
type ProcessAgent() = ...
type SaveAgent() = ...
let saveAgent = new SaveAgent()
let processAgent = new ProcessAgent()
processAgent.Process item (fun z -> saveAgent.Add z)
Also is there ever any reason to wrap a normal function, that's is not maintaining some kind of state, into an agent?
The key thing about encapsulating the agents in classes is that it lets you break the direct dependencies between them. So, you can create the individual agents and then connect them into a bigger "agent network" just by registering event handlers, calling methods, etc.
An agent can essentially expose three kinds of members:
Actions are members of type 'T -> unit. They send some message to the agent without waiting for any reply from the agent. This is essentially wrapping a call to agent.Post.
Blocking actions are members of type 'T -> Async<'R>. This is useful when you're sending some message to the agent, but then want to wait for a response (or confirmation that the action was processed). These do not block the logical thread (they are asynchronous) but they block the execution of the caller. This is essentially wrapping a call to agent.PostAndAsyncReply.
Notifications are members of type IEvent<'T> or IObservable<'T> representing some sort of notification reported from the agent - e.g. when the agent finishes doing some work and wants to notify the caller (or other agents).
In your example, the processing agent is doing some work (asynchronously) and then returns the result, so I think it makes sense to use "Blocking action". The operation of the saving agent is just an "Action", because it does not return anything. To demonstrate the last case, I'll add "flushed" notification, which gets called when the saving agent saves all queued items to the actual storage:
// Two-way communication processing a string
type ProcessMessage =
PM of string * AsyncReplyChannel<string>
type ProcessAgent() =
let agent = MailboxProcessor.Start(fun inbox -> async {
while true do
let! (PM(s, repl)) = inbox.Receive()
repl.Reply("Echo: " + s) })
// Takes input, processes it and asynchronously returns the result
member x.Process(input) =
agent.PostAndAsyncReply(fun ch -> PM(input, ch))
type SaveAgent() =
let flushed = Event<_>()
let agent = (* ... *)
// Action to be called to save a new processed value
member x.Add(res) =
agent.Post(res)
// Notification triggered when the cache is flushed
member x.Flushed = flushed.Publish
Then you can create both agents and connect them in various ways using the members:
let proc = ProcessAgent()
let save = SaveAgent()
// Process an item and then save the result
async {
let! r = proc.Process("Hi")
save.Save(r) }
// Listen to flushed notifications
save.Flushed |> Event.add (fun () ->
printfn "flushed..." )
You don't need to create a class for your agents. Why not just write a function that returns your MailboxProcessor?
let MakeSaveAgent() =
MailboxProcessor<SaveMessageType>.Start(fun inbox ->
(* etc... *))
let MakeProcessAgent (saveAgent: MailboxProcessor<SaveMessageType>) =
MailboxProcessor<ProcessMessageType>.Start(fun inbox ->
(* etc... you can send messages to saveAgent here *))
For your final question: no, not really, that would be adding unnecessary complication when a simple function returning Async<_> would suffice.

Getting responses from erlang processes

I have an erlang project that makes a lot of concurrent SOAP requests to my application. Currently, it's limited by how many nodes are available, but I would like to adjust it so that each node can send more than one message at a time.
I've figured that problem out, but I don't know how to get a response back from process running the SOAP request.
This is my function that I'm attempting to use to do multiple threads:
batch(Url, Message, BatchSize) ->
inets:start(),
Threads = for(1, BatchSize, fun() -> spawn(fun() -> attack_thread() end) end),
lists:map(fun(Pid) -> Pid ! {Url, Message, self()} end, Threads).
This function gets called by the person who initiated the stress tester, it is called on every node in our network. It's called continually until all the requested number of SOAP requests have been sent and timed.
This is the attack_thread that is sent the message by the batch method:
attack_thread() ->
receive
{Url, Message, FromPID} ->
{TimeTaken, {ok, {{_, 200, _}, _, _}}} = timer:tc(httpc, request, [post, {Url, [{"connection", "close"}, {"charset", "utf-8"}], "text/xml", Message}, [], []]),
TimeTaken/1000/1000.
end
As you can see, I want it to return the number of seconds the SOAP request took. However, erlang's message passing (Pid ! Message) doesn't return anything useful.
How can I get a result back?
Each of your attack_thread() threads can simply drop a message in the mailbox of the process operating the batch/3 function:
FromPid ! {time_taken, self(), TimeTaken / 1000 / 1000}.
but then you need to collect the results:
batch(Url, Message, BatchSize) ->
inets:start(),
Pids = [spawn_link(fun attack_thread/0) || _ <- lists:seq(1, BatchSize],
[Pid ! {Url, Message, self()} || Pid <- Pids],
collect(Pids).
collect([]) -> [];
collect(Pids) ->
receive
{time_taken, P, Time} ->
[Time | collect(Pids -- [P])]
end.
Some other comments: you probably want spawn_link/1 here. If something dies along the way, you want the whole thing to die. Also, be sure to tune inets httpc a bit so it is more effective. You might also want to look at basho_bench or tsung.
Finally, you can use a closure directly rather than pass the url and message:
attack_thread(Url, Message, From) -> ...
So your spawn is:
Self = self(),
Pids = [spawn_link(fun() -> attack_thread(Url, Message, Self) end) || _ <- ...]
It avoids passing in the message in the beginning.