When actor fails, i need to send cause of failure to another actor.
I know there are supervision strategies and i use them. The problem is - i cannot find correct place for such error reporting.
I tried watching actor, but Terminated message does not provide cause of termination.
Currently, i added error handling in Decider:
override def supervisorStrategy: SupervisorStrategy =
OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = Duration(1, TimeUnit.SECONDS), loggingEnabled = true) {
case e: Exception =>
onActorError(sender(), e)
Stop
}
But I think that it is not a good time and place to do so, "decider" should return strategy, and not implicitly do something else.
So the question is: is there a proper place to catch actor exceptions and do something about it?
postRestart method of the supervised actor seems like a good place to do the postmortem logging.
From documentation:
The new actor’s postRestart method is invoked with the exception which
caused the restart. By default the preStart is called, just as in the
normal start-up case.
Related
We are facing an MismatchingMessageCorrelationException for the receive task in some cases (less than 5%)
The call back to notify receive task is done by :
protected void respondToCallWorker(
#NonNull final String correlationId,
final CallWorkerResultKeys result,
#Nullable final Map<String, Object> variables
) {
try {
runtimeService.createMessageCorrelation("callWorkerConsumer")
.processInstanceId(correlationId)
.setVariables(variables)
.setVariable("callStatus", result.toString())
.correlateWithResult();
} catch(Exception e) {
e.printStackTrace();
}
}
When i check the logs : i found that the query executed is this one :
select distinct RES.* from ACT_RU_EXECUTION RES
inner join ACT_RE_PROCDEF P on RES.PROC_DEF_ID_ = P.ID_
WHERE RES.PROC_INST_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0' and RES.SUSPENSION_STATE_ = '1'
and exists (select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = RES.ID_ and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer' )
Some times, When i look for the instance of the process in the database i found it waiting in the receive task
SELECT DISTINCT * FROM ACT_RU_EXECUTION RES
WHERE id_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
However, when i check the subscription event, it's not yet created in the database
select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer'
I think that the solution is to save the "receive task" before getting the response for respondToCallWorker, but sadly i can't figure it out.
I tried "asynch before" callWorker and "Message consumer" but it did not work,
I also tried camunda.bpm.database.jdbc-batch-processing=false and got the same results,
I tried also parallel branches but i get OptimisticLocak exception and MismatchingMessageCorrelationException
Maybe i am doing it wrong
Thanks for your help
This is an interesting problem. As you already found out, the error happens, when you try to correlate the result from the "worker" before the main process ended its transaction, thus there is no message subscription registered at the time you correlate.
This problem in process orchestration is described and analyzed in this blog post, which is definitely worth reading.
Taken from that post, here is a design that should solve the issue:
You make message send and receive parallel and put an async before the send task.
By doing so, the async continuation job for the send event and the message subscription are written in the same transaction, so when the async message send executes, you already have the subscription waiting.
Although this should work and solve the issue on BPMN model level, it might be worth to consider options that do not require remodeling the process.
First, instead of calling the worker directly from your delegate, you could (assuming you are on spring boot) publish a "CallWorkerCommand" (simple pojo) and use a TransactionalEventLister on a spring bean to execute the actual call. By doing so, you first will finish the BPMN process by subscribing to the message and afterwards, spring will execute your worker call.
Second: you could use a retry mechanism like resilience4j around your correlate message call, so in the rare cases where the result comes to quickly, you fail and retry a second later.
Another solution I could think of, since you seem to be using an "external worker" pattern here, is to use an external-task-service task directly, so the send/receive synchronization gets solved by the Camunda external worker API.
So many options to choose from. I would possibly prefer the external task, followed by the transactionalEventListener, but that is a matter of personal preference.
I have a network of nodes all implemented using custom GraphStageLogic. I can't find any API to determine when a stage throws an exception (e.g. IllegalArgumentException for Cannot pull port). The only thing Akka does is fail the down stream connections. What I need to determine is, for example in postStop or through a callback, when a node shuts down due to runtime exception, and propagate that information to a Promise that monitors the state of the entire system. Using withAttributes(supervisionStrategy) does not have any effect, either. It seems bewildering to me that there is no way to monitor exceptions thrown inside a GraphStageLogic? failStage is final like basically the entire API of GraphStageLogic.
Using decider when defining the ActorMaterializer used for materializing the Graph should work:
implicit val materializer: ActorMaterializer = ActorMaterializer(
ActorMaterializerSettings(actorSystem).withSupervisionStrategy(decider))
where decider is the typical
val decider: Supervision.Decider = {
case e: IllegalArgumentException => ....
}
I have one actor sending a message to another actor. It successfully does so multiple times, but after a few messages, the second actor stops processing the messages. The system itself isn't very loaded.
The test that reproduces the problem is:
test("case2: Primary (in isolation) should react properly to Insert, Remove, Get") {
val arbiter = TestProbe()
val primary = system.actorOf(Replica.props(arbiter.ref, Persistence.props(flaky = false)), "case2-primary")
val client = session(primary)
arbiter.expectMsg(Join)
arbiter.send(primary, JoinedPrimary)
client.getAndVerify("k1")
client.setAcked("k1", "v1")
client.getAndVerify("k1")
client.getAndVerify("k2")
client.setAcked("k2", "v2") // assertion failure happens here
client.getAndVerify("k2")
client.removeAcked("k1")
client.getAndVerify("k1")
}
Since this is part of a Coursera course, I'd rather not post my implementation.
What kinds of things might cause this failure?
I have been trying to use c++/cx StorageFile::ReadAsync() to read a file in a store-apps, but it always return an invalid params exception no matter what
// "file" are returned from FileOpenPicker
IRandomAccessStream^ reader = create_task(file->OpenAsync(FileAccessMode::Read)).get();
if (reader->CanRead)
{
BitmapImage^ b = ref new BitmapImage();
const int count = 1000000;
Streams::Buffer^ bb = ref new Streams::Buffer(count);
create_task(reader->ReadAsync(bb, 1, Streams::InputStreamOptions::None)).get();
}
I have turn on all the manifest capabilities and added "file open picker" + "file type association" for Declarations. Any ideas ? thanks!
ps: most solutions I found is for C#, but the code structure are similar...
If this code is executing on the UI thread (or in any other Single Threaded Apartment, or STA), then the calls to .get() will throw if the tasks have not yet completed, because the call to .get() would block the thread. You must not block the UI thread or any other STA, and when compiling with C++/CX support enabled, the libraries enforce this.
If you turn on first chance exception handling in the debugger (Debug -> Exceptions..., check the C++ Exceptions check box), you should see that the first exception to be thrown is an invalid_operation exception, from the following line in <ppltasks.h>:
// In order to prevent Windows Runtime STA threads from blocking the UI, calling
// task.wait() task.get() is illegal if task has not been completed.
if (!_IsCompleted() && !_IsCanceled())
{
throw invalid_operation("Illegal to wait on a task in a Windows Runtime STA");
}
The "invalid parameter" you are reporting is the fatal error that is caused when this exception reaches the ABI boundary: the debugger is notified that the application is about to terminate because this exception was unhandled.
You need to restructure your code to use continuations, using task::then, as described in the article Asynchronous Programming in C++ Using PPL
Just to make sure you understand the async pattern, what is happening in your code is that you call create_task and immediately after that task has started you are trying to get the result with .get(). Calls to .get() will throw immediately if the task is still running or the file could not be found. Therefore, the correct way of structuring this is using a .then on your file task, ensuring that you have the result of this task before starting the next one.
create_task(file->OpenAsync(FileAccessMode::Read)).then([](IRandomAccessStream^ reader)
{
//do stuff with the reader
});
At that point the reader is available so you can do whatever you want to, even start a new task.
Also, it is possible that the call to OpenAsync is failing cause the file is empty, I would add a try catch block to the previous task, the one that gets the file, just to make sure that's not the problem.
Im new to AKKA2.The following is my question:
There is a server actor and several client actors.
The server stores all the ref of the client actors.
I wonder how the server can detect which client is disconnected(shutdown, crash...)
And if there is a way to tell the clients that the server is dead.
There are two ways to interact with an actor's lifecycle. First, the parent of an actor defines a supervisory policy that handles actor failures and has the option to restart, stop, resume, or escalate after a failure. In addition, a non-supervisor actor can "watch" an actor to detect the Terminated message generated when the actor dies. This section of the docs covers the topic: http://doc.akka.io/docs/akka/2.0.1/general/supervision.html
Here's an example of using watch from a spec. I start an actor, then set up a watcher for the Termination. When the actor gets a PoisonPill message, the event is detected by the watcher:
"be able to watch the proxy actor fail" in {
val myProxy = system.actorOf(Props(new VcdRouterActor(vcdPrivateApiUrl, vcdUser, vcdPass, true, sessionTimeout)), "vcd-router-" + newUuid)
watch(myProxy)
myProxy ! PoisonPill
expectMsg(Terminated(`myProxy`))
}
Here's an example of a custom supervisor strategy that Stops the child actor if it failed due to an authentication exception since that probably will not be correctable, or escalates the failure to a higher supervisor if the failure was for another reason:
override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 5, withinTimeRange = 1 minutes) {
// presumably we had a connection, and lost it. Let's restart the child and see if we can re-establish one.
case e: AuthenticationException ⇒
log.error(e.message + " Stopping proxy router for this host")
Stop
// don't know what it was, escalate it.
case e: Exception ⇒
log.warning("Unknown exception from vCD proxy. Escalating a {}", e.getClass.getName)
Escalate
}
Within an actor, you can generate the failure by throwing an exception or handling a PoisonPill message.
Another pattern that may be useful if you don't want to generate a failure is to respond with a failure to the sender. Then you can have a more personal message exchange with the caller. For example, the caller can use the ask pattern and use an onComplete block for handling the response. Caller side:
vcdRouter ? DisableOrg(id) mapTo manifest[VcdHttpResponse] onComplete {
case Left(failure) => log.info("receive a failure message")
case Right(success) ⇒ log.info("org disabled)
}
Callee side:
val org0 = new UUID("00000000-0000-0000-0000-000000000000")
def receive = {
case DisableOrg(id: UUID) if id == org0 => sender ! Failure(new IllegalArgumentException("can't disable org 0")
case DisableOrg(id: UUID) => sender ! disableOrg(id)
}
In order to make your server react to changes of remote client status you could use something like the following (example is for Akka 2.1.4).
In Java
#Override
public void preStart() {
context().system().eventStream().subscribe(getSelf(), RemoteLifeCycleEvent.class);
}
Or in Scala
override def preStart = {
context.system.eventStream.subscribe(listener, classOf[RemoteLifeCycleEvent])
}
If you're only interested when the client is disconnected you could register only for RemoteClientDisconnected
More info here(java)and here(scala)
In the upcoming Akka 2.2 release (RC1 was released yesterday), Death Watch works both locally and remote. If you watch the root guardian on the other system, when you get Terminated for him, you know that the remote system is down.
Hope that helps!