RestartFlow.onFailuresWithBackoff not restarting on failure - akka

We are encountering an unexpected behavior from:
RestartFlow.onFailuresWithBackoff(restartSetting)(flowFactory: () => Flow[In, Out, _])
See RestartFlow.onFailuresWithBackoff.
We're using:
akka:2.6.15 akka-stream-kafka_2.12:2.1.1 scala:2.12.13.
We're using the following RestartSettings:
RestartSettings(minBackoff=1.seconds, maxBackoff=900.seconds, randomFactor=0.5
).withMaxRestarts(maxRestarts=8, maxRestartsWithin=1.seconds)
Expected behavior: The wrapped flow which is wrapped with RestartFlow should restart when failure occurs.
Actual behavior: The wrapped flow is not restarting when the failure occurs; it goes to Supervision.Decider for the failure.
Supervision.Decider is concatenated after the wrapped RestartFlow to handle the scenario where the restart got exhausted and the failure can be handled by Supervision.Decider.
Source -> flow1 -> flow2 -> RestartFlow.onFailuresWithBackoff(flow3)
-> flow4 + (Supervision.Decider) -> Sink

Related

Kafka alPakka throwing compile issues when using restart source

Trying to use restart with source in case of Kafka rebalance , however below code is throwing error when using Alpakka Kafka library
Throwing complies error while using below code
Caused by: java.lang.ClassCastException: akka.NotUsed$ cannot be cast to java.util.concurrent.CompletionStage
CompletionStage<Done> streamCompletion =
RestartSource.onFailuresWithBackoff(
restartSettings,
() ->
Consumer.plainSource(consumerSettings, Subscriptions.topics(topic))
.mapMaterializedValue(
c -> {
// this is a hack to get access to the Consumer.Control
// instances of the latest Kafka Consumer source
control.set(c);
return c;
})
.via(business()))
.runWith(Sink.ignore(), system);

How to send multiple TCP messages and continue when one has succeeded

I'm writing some networking code currently and I need to send out a large number of messages and then wait for a single response.
Given that I have a function that returns the input and output channels for a socket I have:
let resps = List.map uris ~f:(fun uri ->
let%lwt (ic,oc) = connect uri in
let%lwt () = Lwt_io.write_value oc msg in
Lwt_io.read_value ic
) in
Lwt.pick resps
My understanding of this is that pick should cancel any ongoing requests after it has a fulfilled promise in resps. The issue is that if any of those connections fails/is refused, an exception is raised Unix.ECONNREFUSED.
My question is what is the correct semantics to force Lwt.pick to ignore the exceptions?
Options I've thought of so far are to catch the
exception explicity in the requests:
let resps = List.map uris ~f:(fun uri ->
try
let%lwt (ic,oc) = connect uri in
let%lwt () = Lwt_io.write_value oc msg in
Lwt_io.read_value ic
with Unix_error (e,_,_) -> ...
) in
Lwt.pick resps
But I'm not sure under what conditions the Lwt.pick will view those promises are rejected?
Update: I'm now handling the errors with cancellable, unfulfillable promises:
fst ## Lwt.task ()
This feels hacky but seems to work so far.
Handling the exception explicitly is right. Lwt promises are rejected when you either reject them explicitly (using Lwt.fail), or when an exception is caught by Lwt, in a callback that should have returned a promise (like the one you would pass to Lwt.bind).
However, for handling exceptions in code that calls into Lwt, you have to use try%lwt instead of the plain try.

Akka Stream how to determine inside GraphStageLogic whether it failed

I have a network of nodes all implemented using custom GraphStageLogic. I can't find any API to determine when a stage throws an exception (e.g. IllegalArgumentException for Cannot pull port). The only thing Akka does is fail the down stream connections. What I need to determine is, for example in postStop or through a callback, when a node shuts down due to runtime exception, and propagate that information to a Promise that monitors the state of the entire system. Using withAttributes(supervisionStrategy) does not have any effect, either. It seems bewildering to me that there is no way to monitor exceptions thrown inside a GraphStageLogic? failStage is final like basically the entire API of GraphStageLogic.
Using decider when defining the ActorMaterializer used for materializing the Graph should work:
implicit val materializer: ActorMaterializer = ActorMaterializer(
ActorMaterializerSettings(actorSystem).withSupervisionStrategy(decider))
where decider is the typical
val decider: Supervision.Decider = {
case e: IllegalArgumentException => ....
}

Onyx: Can't pick up trigger/emit results in the next task

I'm trying to get started with Onyx, the distributed computing platform in Clojure. In particular, I try to understand how to aggregate data. If I understand the documentation correctly, a combination of a window and a :trigger/emit function should allow me to do this.
So, I modified the aggregation example (Onyx 0.13.0) in three ways (cf. gist with complete code):
in -main I println any segments put on the output channel; this works as expected with the original code in that it picks up all segments and prints them to stdout.
I add an emit function like this:
(defn make-ds
[event window trigger {:keys [lower-bound upper-bound event-type] :as state-event} extent-state]
(println "make-ds called")
{:ds window})
I add a trigger configuration (original dump-words trigger emitted for brevity):
(def triggers
[{:trigger/window-id :word-counter
:trigger/id :make-ds
:trigger/on :onyx.triggers/segment
:trigger/fire-all-extents? true
:trigger/threshold [5 :elements]
:trigger/emit ::make-ds}])
I change the :count-words task to from calling the identity function to the reduce type, so that it doesn't hand over all input segments to the output (and added config options that onyx should tackle this as a batch):
{:onyx/name :count-words
;:onyx/fn :clojure.core/identity
:onyx/type :reduce ; :function
:onyx/group-by-key :word
:onyx/flux-policy :kill
:onyx/min-peers 1
:onyx/max-peers 1
:onyx/batch-size 1000
:onyx/batch-fn? true}
When I run this now, I can see in the output that the emit function (i.e. make-ds) gets called for each input segment (first output coming from the dump-words trigger of the original code):
> lein run
[....]
Om -> 1
name -> 1
My -> 2
a -> 1
gone -> 1
Coffee -> 1
to -> 1
get -> 1
Time -> 1
make-ds called
make-ds called
make-ds called
make-ds called
[....]
However, the segment build from make-ds doesn't make it through to the output-channel, they are never being printed. If I revert the :count-words task to the identity function, this works just fine. Also, it looks as if the emit function is called for each input segment, whereas I would expect it to be called only when the threshold condition is true (i.e. whenever 5 elements have been aggregated in the window).
As the test for this functionality within the Onyx code base (onyx.windowing.emit-aggregate-test) is passing just fine, I guess I'm making a stupid mistake somewhere, but I'm at a loss figuring out what.
I finally saw that there was a warning in the log file onxy.log like this:
[clojure.lang.ExceptionInfo: Windows cannot be checkpointed with ZooKeeper unless
:onyx.peer/storage.zk.insanely-allow-windowing? is set to true in the peer config.
This should only be turned on as a development convenience.
[clojure.lang.ExceptionInfo: Handling uncaught exception thrown inside task
lifecycle :lifecycle/checkpoint-state. Killing the job. -> Exception type:
clojure.lang.ExceptionInfo. Exception message: Windows cannot be checkpointed with
ZooKeeper unless :onyx.peer/storage.zk.insanely-allow-windowing? is set to true in
the peer config. This should only be turned on as a development convenience.
As soon as I set this, I finally got some segments handed over to the next task. I.e., I had to change the peer config to:
(def peer-config
{:zookeeper/address "127.0.0.1:2189"
:onyx/tenancy-id id
:onyx.peer/job-scheduler :onyx.job-scheduler/balanced
:onyx.peer/storage.zk.insanely-allow-windowing? true
:onyx.messaging/impl :aeron
:onyx.messaging/peer-port 40200
:onyx.messaging/bind-addr "localhost"})
Now, :onyx.peer/storage.zk.insanely-allow-windowing? doesn't sound like a good thing to do. Lucas Bradstreet recommended on the Clojurians Slack channel switching to S3 checkpointing.

Can we trigger a particular activity of a specific execution of a state machine built using AWS Step function?

Using GetActivityTask API provides access to activity of any random one execution of state machine already running. Is it possible to get a particular activity of a particular execution ?
Suppose my state machine is -
Start -> A -> B -> C -> End
Execution1 : Start1 -> A1 -> B1 -> C1 -> End
Execution2 : Start2 -> A2 -> B2 -> C2 -> End
Can I get access to A1 particularly if I have executionId (Execution1) and ActivityARN name (A) ? If not why does AWS do not allow this.
Requirement : I want to create APIs like
1> SubmitRequest -
Input - Some input
Output - RequestId
Which starts a particular execution of state machine and returns after activity A is executed. Behind the scene machine runs B
2> GetC -
Input - RequestId
Output - If the state machine is in correct state to call C we should run C and provide its output otherwise throw an exception.
So basically want to use AWS step function to manage state for my application and if an API is called in incorrect state throw exception.
If I understand your question, the GetExecutionHistory API call should do what you need. It will let you see the execution status of each state and activity and see the input and output of each.