Observable defer in Akka Streams - akka

I´m coming from ReactiveX and there we have the operator defer, in order to create an Observable and get the emission value once we have a subscriber.
Here in Akka Streams I was wondering if something like that exists:
#Test def defer(): Unit = {
var range = 0 to 10
val graphs = Source(range)
.to(Sink.foreach(println))
range = 10 to 20
graphs.run()
Thread.sleep(2000)
}
Having this code, even before we execute run(), changing the value of the range, the value is not changed since the blueprint is already created, and emits 0 to 10.
Is anything like Observable.defer in Akka Streams?
SOLUTION:
I found the solution, the solution is using lazy keyword, where we provide a function which to be executed once we run the stream.
I will keep the question just in case there´s a better way or someone else has the same question
#Test def defer(): Unit = {
var range = 0 to 10
val graphs = Source.lazily(() => Source(range))
.to(Sink.foreach(println))
range = 10 to 20
graphs.run()
Thread.sleep(2000)
}
Regards.

The simplest way would probably be Source.fromIterator(() => List(1).iterator) or something similar. In the Akka Streams API we opted to try to keep the minimal set of operators, so sometimes you may get into situations where the same is achievable in an one-liner, but would not have a direct counterpart with a name like in defer's case here. If you think it's a common enough thing please let us know on github.com/akka/akka and we could consider adding it as an API.
Note that there's also fromFuture and other ones, which while not directly related may be useful depending on your actual use-case (esp. when combined with a Promise etc).

Related

In Flink is it possible to use state with a non keyed stream?

Lets assume that I have an input DataStream and want to implement some functionality that requires "memory" so I need ProcessFunction that gives me access to state. Is it possible to do it straight to the DataStream or the only way is to keyBy the initial stream and work in keyed-context?
I'm thinking that one solution would be to keyBy the stream with a hardcoded unique key so the whole input stream ends up in the same group. Then technically I have a KeyedStream and I can normally use keyed state, like I'm showing below with keyBy(x->1). But is this a good solution?
DataStream<Integer> inputStream = env.fromSource(...)
DataStream<Integer> outputStream = inputStream
.keyBy(x -> 1)
.process(...) //I've got acess to state heree
As I understand that's not a common usecase because the main purpose of flink is to partition the stream, process them seperately and then merge the results. In my scenario thats exactly what I'm doing, but the problem is that the merge step requires state to produce the final "global" result. What I actually want to do is something like this:
DataStream<Integer> inputStream = env.fromElements(1,2,3,4,5,6,7,8,9)
//two groups: group1=[1,2,3,4] & group2=[5,6,7,8,9]
DataStream<Integer> partialResult = inputStream
.keyBy(val -> val/5)
.process(<..stateful processing..>)
//Can't do statefull processing here because partialResult is not a KeyedStream
DataStream<Integer> outputStream = partialResult
.process(<..statefull processing..>)
outputStream.print();
But Flink doesnt seem to allow me do the final "merge partial results operation" because I can't get access to state in process function as partialResult is not a KeyedStream.
I'm beginner to flink so I hope what I'm writing makes sense.
In general I can say that I haven't found a good way to do the "merging" step, especially when it comes to complex logic.
Hope someone can give me some info, tips or correct me if I'm missing something
Thank you for your time
Is "keyBy the stream with a hardcoded unique key" a good idea? Well, normally no, since it forces all data to flow through a single sub-task, so you get no benefit from the full parallelism in your Flink cluster.
If you want to get a global result (e.g. the "best" 3 results, from any results generated in the preceding step) then yes, you'll have to run all records through a single sub-task. So you could have a fixed key value, and use a global window. But note (as the docs state) you need to come up with some kind of "trigger condition", otherwise with a streaming workflow you never know when you really have the best N results, and thus you'd never emit any final result.

Akka stream difference between map( T => Future[U]) and flatMapConcat( T => Source.fromFuture(Future[U]))

Please, what is the difference between those two approaches defining a Sink[RandomCdr,Future[Done]
Flow[RandomCdr]
.grouped(bulkSize)
.flatMapConcat{ (bulk : Seq[RandomCdr]) =>
Source.fromFuture(collection.flatMap(_.insert[RandomCdr](false)(randomCdrWriter,ec).many(bulk)(ec))(ec))
}
.toMat(Sink.ignore)(Keep.right)
Flow[RandomCdr]
.grouped(bulkSize)
.map((bulk : Seq[RandomCdr]) => collection.flatMap(_.insert[RandomCdr](false)(randomCdrWriter,ec).many(bulk)(ec))(ec))
.toMat(Sink.ignore)(Keep.right)
The function collection.flatMap(_.insert[RandomCdr](false)(randomCdrWriter,ec).many(bulk)(ec))(ec) that returns a Future[T] is the reactivemongo driver
First snippet
Here each incoming bulk will be transformed into a Future, and said Future will be run within the execution context you provide. Only at this point, the next bulk will be processed by generating another Future, and so on.
Basically the futures are run in sequence. This is similar in behaviour to
Flow[RandomCdr]
.grouped(bulkSize)
.mapAsync(parallelism = 1){ (bulk : Seq[RandomCdr]) =>
collection.flatMap(_.insert[RandomCdr](false)(randomCdrWriter,ec).many(bulk)(ec))(ec)
}
.toMat(Sink.ignore)(Keep.right)
Second snippet
Here each incoming bulk will be transformed into a Future, which will be run within the execution context you provide. The Future will be then immediately passed to the Sink.ignore and its reference will be thrown away.
With this approach there is no control around how many Futures will be run at the same time. For this reason this approach is not recommended.
If you're looking for improved parallelism, consider using mapAsync as shown above, and tweak the parallelism parameter.

Unit testing an agent

I am trying to test a MailboxProcessor in F#. I want to test that the function f I am giving is actually executed when posting a message.
The original code is using Xunit, but I made an fsx of it that I can execute using fsharpi.
So far I am doing this :
open System
open FSharp
open System.Threading
open System.Threading.Tasks
module MyModule =
type Agent<'a> = MailboxProcessor<'a>
let waitingFor timeOut (v:'a)=
let cts = new CancellationTokenSource(timeOut|> int)
let tcs = new TaskCompletionSource<'a>()
cts.Token.Register(fun (_) -> tcs.SetCanceled()) |> ignore
tcs ,Async.AwaitTask tcs.Task
type MyProcessor<'a>(f:'a->unit) =
let agent = Agent<'a>.Start(fun inbox ->
let rec loop() = async {
let! msg = inbox.Receive()
// some more complex should be used here
f msg
return! loop()
}
loop()
)
member this.Post(msg:'a) =
agent.Post msg
open MyModule
let myTest =
async {
let (tcs,waitingFor) = waitingFor 5000 0
let doThatWhenMessagepostedWithinAgent msg =
tcs.SetResult(msg)
let p = new MyProcessor<int>(doThatWhenMessagepostedWithinAgent)
p.Post 3
let! result = waitingFor
return result
}
myTest
|> Async.RunSynchronously
|> System.Console.WriteLine
//display 3 as expected
This code works, but it does not look fine to me.
1) is the usage of TaskCompletionSource normal in f# or is there some dedicated stuff to allow me waiting for a completion?
2) I am using a second argument in the waitingFor function in order to contraint it, I know I could use a type MyType<'a>() to do it, is there another option? I would rather not use a new MyType that I find cumbersome.
3) Is there any other option to test my agent than doing this? the only post I found so far about the subject is this blogpost from 2009 http://www.markhneedham.com/blog/2009/05/30/f-testing-asynchronous-calls-to-mailboxprocessor/
This is a tough one, I've been trying to tackle this for some time as well. This is what I found so far, it's too long for a comment but I'd hesitate to call it a full answer either...
From simplest to most complex, depends really how thoroughly you want to test, and how complex is the agent logic.
Your solution may be fine
What you have is fine for small agents whose only role is to serialize access to an async resource, with little or no internal state handling. If you provide the f as you do in your example, you can be pretty sure it will be called in a relatively short timeout of few hundred milliseconds. Sure, it seems clunky and it's double the size of code for all the wrappers and helpers, but those can be reused it you test more agents and/or more scenarios, so the cost gets amortized fairly quickly.
The problem I see with this is that it's not very useful if you also want to verify more than than the function was called - for example the internal agent state after calling it.
One note that's applicable to other parts of the response as well: I usually start agents with a cancellation token, it makes both production and testing life cycle easier.
Use Agent reply channels
Add AsyncReplyChannel<'reply> to the message type and post messages using PostAndAsyncReply instead of Post method on the Agent. It will change your agent to something like this:
type MyMessage<'a, 'b> = 'a * AsyncReplyChannel<'b>
type MyProcessor<'a, 'b>(f:'a->'b) =
// Using the MyMessage type here to simplify the signature
let agent = Agent<MyMessage<'a, 'b>>.Start(fun inbox ->
let rec loop() = async {
let! msg, replyChannel = inbox.Receive()
let! result = f msg
// Sending the result back to the original poster
replyChannel.Reply result
return! loop()
}
loop()
)
// Notice the type change, may be handled differently, depends on you
member this.Post(msg:'a): Async<'b> =
agent.PostAndAsyncReply(fun channel -> msg, channel)
This may seem like an artificial requirement for the agent "interface", but it's handy to simulate a method call and it's trivial to test - await the PostAndAsyncReply (with a timeout) and you can get rid of most of the test helper code.
Since you have a separate call to the provided function and replyChannel.Reply, the response can also reflect the agent state, not just the function result.
Black-box model-based testing
This is what I'll talk about in most detail as I think it's most general.
In case the agent encapsulates more complex behavior, I found it handy to skip testing individual messages and use model-based tests to verify whole sequences of operations against a model of expected external behavior. I'm using FsCheck.Experimental API for this:
In your case this would be doable, but wouldn't make much sense since there is no internal state to model. To give you an example what it looks like in my particular case, consider an agent which maintains client WebSocket connections for pushing messages to the clients. I can't share the whole code, but the interface looks like this
/// For simplicity, this adapts to the socket.Send method and makes it easy to mock
type MessageConsumer = ArraySegment<byte> -> Async<bool>
type Message =
/// Send payload to client and expect a result of the operation
| Send of ClientInfo * ArraySegment<byte> * AsyncReplyChannel<Result>
/// Client connects, remember it for future Send operations
| Subscribe of ClientInfo * MessageConsumer
/// Client disconnects
| Unsubscribe of ClientInfo
Internally the agent maintains a Map<ClientInfo, MessageConsumer>.
Now for testing this, I can model the external behavior in terms of informal specification like: "sending to a subscribed client may succeed or fail depending on the result of calling the MessageConsumer function" and "sending to an unsubscribed client shouldn't invoke any MessageConsumer". So I can define types for example like these to model the agent.
type ConsumerType =
| SucceedingConsumer
| FailingConsumer
| ExceptionThrowingConsumer
type SubscriptionState =
| Subscribed of ConsumerType
| Unsubscribed
type AgentModel = Map<ClientInfo, SubscriptionState>
And then use FsCheck.Experimental to define the operations of adding and removing clients with differently successful consumers and trying to send data to them. FsCheck then generates random sequences of operations and verifies the agent implementation against the model between each steps.
This does require some additional "test only" code and has a significant mental overhead at the beginning, but lets you test relatively complex stateful logic. What I particularly like about this is that it helps me test the whole contract, not just individual functions/methods/messages, the same way that property-based/generative testing helps test with more than just a single value.
Use Actors
I haven't gone that far yet, but what I've also heard as an alternative is using for example Akka.NET for full-fledged actor model support, and use its testing facilities which let you run agents in special test contexts, verify expected messages and so on. As I said, I don't have first-hand experience, but seems like a viable option for more complex stateful logic (even on a single machine, not in a distributed multi-node actor system).

Concurrency in JRuby

I'm working on a Sinatra-JRuby application and handling a situation where concurrent API calls are coming to the API layer and then subsequently handled by service and adapter layer. I'm having a GlobalService module to share the common information, so that I can access this common information from any other layer. This works just fine until concurrent calls come and reset the value of thw previous API. Though I've implemented Mutex to address this problem, but I've got a gut feeling that this is not the right approach to the problem. Here is what I've implemented:
require 'thread'
module GlobalService
##mutex = Mutex.new
def self.set_header(auth_subject, transaction_id)
##mutex.synchronize {
#auth_subject = auth_subject
#transaction_id = transaction_id
}
end
def self.get_header
##mutex.synchronize {
return #auth_subject, #transaction_id
}
end
end
Please let me know of any alternative solution to address this problem.
Sharing memory will never really be thread safe, so you might do well to use a message passing strategy. My preference for this would be to create actors and have them communicate about state change. Check out Concurrent Ruby's actors http://ruby-concurrency.github.io/concurrent-ruby/Concurrent/Actor.html.
You could do something like
require 'concurrent'
class GlobalService < Concurrent::Actor::Context
def initialize(auth_subject, transaction_id)
#auth_subject = auth_subject
#transaction_id = transaction_id
end
def on_message(message)
#some logic
#auth_subject = message.auth_subject
#transaction_id = message.transaction_id
end
end
service = GlobalService.spawn(:first, "header...", id)
etc...
I've used this strategy in the past and it has worked out fairly well. You may also want to ask for help over at https://gitter.im/ruby-concurrency/concurrent-ruby Pitrch is usually very helpful!

Akka Gotchas when dealing with Futures

Consider the following code bit:
def receive = {
case ComputeResult(itemId: Long) =>
//val originalSender = sender
computeResult(itemId).map { result =>
originalSender ! result
}
}
The computeResult results in a Future, so how would the introduction of a val prevent my from sending the result to the wrong sender? Let us say I have a completely different Senders (sender1 and sender2).
Sender1 first sends a message followed by Sender2. Without the val in my method above, I clearly see that there is a possibility that my Sender2 could get the result that was actually meant for Sender1.
What I don't get is that how would the introduction of a val prevent me from the scenario that I just described?
sender is actually a function (that's why the convention from Akka 2.3 onwards is to write sender()). By binding the value to originalSender, we can close over that immutable value and know that it won't change, even if another message comes in before the Future from completeResult completes.
Because receive is a function, every invocation will result in a new local value called originalSender.