I have one actor sending a message to another actor. It successfully does so multiple times, but after a few messages, the second actor stops processing the messages. The system itself isn't very loaded.
The test that reproduces the problem is:
test("case2: Primary (in isolation) should react properly to Insert, Remove, Get") {
val arbiter = TestProbe()
val primary = system.actorOf(Replica.props(arbiter.ref, Persistence.props(flaky = false)), "case2-primary")
val client = session(primary)
arbiter.expectMsg(Join)
arbiter.send(primary, JoinedPrimary)
client.getAndVerify("k1")
client.setAcked("k1", "v1")
client.getAndVerify("k1")
client.getAndVerify("k2")
client.setAcked("k2", "v2") // assertion failure happens here
client.getAndVerify("k2")
client.removeAcked("k1")
client.getAndVerify("k1")
}
Since this is part of a Coursera course, I'd rather not post my implementation.
What kinds of things might cause this failure?
Related
We are facing an MismatchingMessageCorrelationException for the receive task in some cases (less than 5%)
The call back to notify receive task is done by :
protected void respondToCallWorker(
#NonNull final String correlationId,
final CallWorkerResultKeys result,
#Nullable final Map<String, Object> variables
) {
try {
runtimeService.createMessageCorrelation("callWorkerConsumer")
.processInstanceId(correlationId)
.setVariables(variables)
.setVariable("callStatus", result.toString())
.correlateWithResult();
} catch(Exception e) {
e.printStackTrace();
}
}
When i check the logs : i found that the query executed is this one :
select distinct RES.* from ACT_RU_EXECUTION RES
inner join ACT_RE_PROCDEF P on RES.PROC_DEF_ID_ = P.ID_
WHERE RES.PROC_INST_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0' and RES.SUSPENSION_STATE_ = '1'
and exists (select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = RES.ID_ and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer' )
Some times, When i look for the instance of the process in the database i found it waiting in the receive task
SELECT DISTINCT * FROM ACT_RU_EXECUTION RES
WHERE id_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
However, when i check the subscription event, it's not yet created in the database
select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer'
I think that the solution is to save the "receive task" before getting the response for respondToCallWorker, but sadly i can't figure it out.
I tried "asynch before" callWorker and "Message consumer" but it did not work,
I also tried camunda.bpm.database.jdbc-batch-processing=false and got the same results,
I tried also parallel branches but i get OptimisticLocak exception and MismatchingMessageCorrelationException
Maybe i am doing it wrong
Thanks for your help
This is an interesting problem. As you already found out, the error happens, when you try to correlate the result from the "worker" before the main process ended its transaction, thus there is no message subscription registered at the time you correlate.
This problem in process orchestration is described and analyzed in this blog post, which is definitely worth reading.
Taken from that post, here is a design that should solve the issue:
You make message send and receive parallel and put an async before the send task.
By doing so, the async continuation job for the send event and the message subscription are written in the same transaction, so when the async message send executes, you already have the subscription waiting.
Although this should work and solve the issue on BPMN model level, it might be worth to consider options that do not require remodeling the process.
First, instead of calling the worker directly from your delegate, you could (assuming you are on spring boot) publish a "CallWorkerCommand" (simple pojo) and use a TransactionalEventLister on a spring bean to execute the actual call. By doing so, you first will finish the BPMN process by subscribing to the message and afterwards, spring will execute your worker call.
Second: you could use a retry mechanism like resilience4j around your correlate message call, so in the rare cases where the result comes to quickly, you fail and retry a second later.
Another solution I could think of, since you seem to be using an "external worker" pattern here, is to use an external-task-service task directly, so the send/receive synchronization gets solved by the Camunda external worker API.
So many options to choose from. I would possibly prefer the external task, followed by the transactionalEventListener, but that is a matter of personal preference.
I have the following source queue definition.
lazy val (processMessageSource, processMessageQueueFuture) =
peekMatValue(
Source
.queue[(ProcessMessageInputData, Promise[ProcessMessageOutputData])](5, OverflowStrategy.dropNew))
def peekMatValue[T, M](src: Source[T, M]): (Source[T, M], Future[M]) {
val p = Promise[M]
val s = src.mapMaterializedValue { m =>
p.trySuccess(m)
m
}
(s, p.future)
}
The Process Message Input Data Class is essentially an artifact that is created when a caller calls a web server endpoint, which is hooked upto this stream (i.e. the service endpoint's business logic puts messages into this queue). The Promise of process message out is something that is completed downstream in the sink of the application, and the web server then has an on complete callback on this future to return the response back.
There are also other sources of ingress into this stream.
Now the buffer may be backed up since the other source may overload the system, thereby triggering stream back pressure. The existing code just drops the new message. But I still want to complete the process message output promise to complete with an exception stating something like "Throttled".
Is there a mechanism to write a custom overflow strategy, or a post processing on the overflowed element that allows me to do this?
According to https://github.com/akka/akka/blob/master/akkastream/src/main/scala/akka/stream/impl/QueueSource.scala#L83
dropNew would work just fine. On clients end it would look like.
processMessageQueue.offer(in, pr).foreach { res =>
res match {
case Enqueued => // Code to handle case when successfully enqueued.
case Dropped => // Code to handle messages that are dropped since the buffier was overflowing.
}
}
I'm testing an akka system using TestKit . One actor of the system I'm testing, upon receiving a certain message type, context.watches the sender, and kills itself when the sender dies:
trait Handler extends Actor {
override def receive: Receive = {
case Init => context.watch(sender)
case Terminated => context.stop(self)
}
}
In my test I'm sending
val probe = TestProbe(system)
val target = TestActorRef(Props(classOf[Handler]))
probe.send(target, Init)
Now, to test the watch / Terminated behavior - I want to simulate the testprobe being killed.
I can do
probe.send(target, Terminated)
But, this presupposes that target has called context.watch(sender) , else it would not receive a Terminated.
I can do
probe.testActor ! Kill
with doesn't send Terminated unless target has correctly called context.watch(sender) , but I don't actually want the testprobe killed, as it needs to remain responsive to test if (for example) target continues to send messages instead of stopping itself .
I'm come across this a few times now, what's the correct way to test if an actor is handling the above situation correctly?
You could watch the actor under test for termination with a separate probe instead of trying to do that via the 'sender' probe:
val probe = TestProbe(system)
val deathWatcher = TestProbe(system)
val target = TestActorRef(Props(classOf[Handler]))
deathWatcher.watch(target)
probe.send(target, Init)
// TODO make sure the message is processed.. perhaps ack it?
probe ! Kill
deathWatcher.expectTerminated(target)
Lets say I ask (?) the same actor for two responses.
It stores the sender for later.
Later, it gets messages back to go to the senders. We get the right sender (the one hashed to the message) but how does Akka know which message the response is for?
Is there something in the ActorRef that indicates which message each response is for?
Is it the 'channel'?
I'd like to understand the underlying technology better.
I'll try to read the source at the same time but I think this is a really good question.
Code example:
class TestActor
[...]
def onReceive = {
case r: MessageToGoOut ⇒
messageId += 1
val requestId = clientConnectionId + messageId
senders += (requestId -> sender) //store sender for later
anotherActor ! WrappedUpMessage(requestId, MessageOut))
case m: MessageToGoBackToSender ⇒
val requestId = m.requestId
senders.get(requestId) map { client ⇒
client ! Response(m.message)
senders -= requestId
}
}
val futures = for(i <- 1 to 100) yield testActor ? new MessageToGoOut ("HEYO!" + i)
Now how does akka ensure the messages get back to the right actor??
Every Actor has a path. From inside of the Actor, you could say:
context.path
From outside an Actor, if you had an ActorRef, you could just say:
ref.path
This path if the address of that individual actor instance, and it's how I believe the internal routing system routes messages to the mailboxes for actors instances. When you are outside of an Actor, like you are when you are looping and sending messages in your example, when you use ask (the ?), a temporary Actor instance is started up so that when the Actor that received the message needs to response, it has a path to respond to. This is probably a bit of an oversimplification, and it might not be the level of detail that you are looking for, so I apologize if I missed the gist of your question.
Also, the sender var in an Actor is an ActorRef, thus it has a path so you can route back to it.
When a Future is created, akka creates a temporary (and addressable) Actor that is basically servicing that Future. When that temporary Actor sends to another Actor, its ActorRef is transmitted as the sender. When the receiving actor is processing that specific message, the sender var is set to the ActorRef for that temp actor, meaning that you have an address to respond to. Even if you decide to hold on to that sender for later, you still have an address to send back to and eventually complete the Future that the temporary actor is servicing. The point is, as long as you have an ActorRef, whether it's a request or a response, all it's doing is routing a message to the path for that ActorRef.
Ask (?) and tell (!) really aren't much different. Ask is basically a tell where the sender is expecting the receiver to tell a message back to it.
Im new to AKKA2.The following is my question:
There is a server actor and several client actors.
The server stores all the ref of the client actors.
I wonder how the server can detect which client is disconnected(shutdown, crash...)
And if there is a way to tell the clients that the server is dead.
There are two ways to interact with an actor's lifecycle. First, the parent of an actor defines a supervisory policy that handles actor failures and has the option to restart, stop, resume, or escalate after a failure. In addition, a non-supervisor actor can "watch" an actor to detect the Terminated message generated when the actor dies. This section of the docs covers the topic: http://doc.akka.io/docs/akka/2.0.1/general/supervision.html
Here's an example of using watch from a spec. I start an actor, then set up a watcher for the Termination. When the actor gets a PoisonPill message, the event is detected by the watcher:
"be able to watch the proxy actor fail" in {
val myProxy = system.actorOf(Props(new VcdRouterActor(vcdPrivateApiUrl, vcdUser, vcdPass, true, sessionTimeout)), "vcd-router-" + newUuid)
watch(myProxy)
myProxy ! PoisonPill
expectMsg(Terminated(`myProxy`))
}
Here's an example of a custom supervisor strategy that Stops the child actor if it failed due to an authentication exception since that probably will not be correctable, or escalates the failure to a higher supervisor if the failure was for another reason:
override val supervisorStrategy = OneForOneStrategy(maxNrOfRetries = 5, withinTimeRange = 1 minutes) {
// presumably we had a connection, and lost it. Let's restart the child and see if we can re-establish one.
case e: AuthenticationException ⇒
log.error(e.message + " Stopping proxy router for this host")
Stop
// don't know what it was, escalate it.
case e: Exception ⇒
log.warning("Unknown exception from vCD proxy. Escalating a {}", e.getClass.getName)
Escalate
}
Within an actor, you can generate the failure by throwing an exception or handling a PoisonPill message.
Another pattern that may be useful if you don't want to generate a failure is to respond with a failure to the sender. Then you can have a more personal message exchange with the caller. For example, the caller can use the ask pattern and use an onComplete block for handling the response. Caller side:
vcdRouter ? DisableOrg(id) mapTo manifest[VcdHttpResponse] onComplete {
case Left(failure) => log.info("receive a failure message")
case Right(success) ⇒ log.info("org disabled)
}
Callee side:
val org0 = new UUID("00000000-0000-0000-0000-000000000000")
def receive = {
case DisableOrg(id: UUID) if id == org0 => sender ! Failure(new IllegalArgumentException("can't disable org 0")
case DisableOrg(id: UUID) => sender ! disableOrg(id)
}
In order to make your server react to changes of remote client status you could use something like the following (example is for Akka 2.1.4).
In Java
#Override
public void preStart() {
context().system().eventStream().subscribe(getSelf(), RemoteLifeCycleEvent.class);
}
Or in Scala
override def preStart = {
context.system.eventStream.subscribe(listener, classOf[RemoteLifeCycleEvent])
}
If you're only interested when the client is disconnected you could register only for RemoteClientDisconnected
More info here(java)and here(scala)
In the upcoming Akka 2.2 release (RC1 was released yesterday), Death Watch works both locally and remote. If you watch the root guardian on the other system, when you get Terminated for him, you know that the remote system is down.
Hope that helps!