How to stop a poll actor - akka

I'm using akka actors for doing some tasks that get scheduled, like a poll coming live on a date/time that was scheduled.
This way I'm creating an actor...
final ActorRef pollActor = pollSystem.actorOf(new Props(
new UntypedActorFactory() {
public UntypedActor create() {
return new PollActor(pollObj);
}
}), "pollActor" + pollObj.getId()+":"+pollMts);
But when I update a poll that was already created to change the scheduled go-live date, there I can create another actor and I want the existing actor for the same poll to be stopped.
For that I'm doing this...
ActorRef pollActor = pollSystem
.actorFor("akka://pollSystem/user/pollActor" + poll.getId()+":"+oldPollMTS);
pollActor.tell(PoisonPill.getInstance(), null);
But the old actor is not getting stopped and no postStop() method invoked. I tried Kill.getInstance() too, but in vain.
Help me to find a way that I can stop the old actor and the messages sent to it; thereby create a new actor.

Related

MismatchingMessageCorrelationException : Cannot correlate message ‘onEventReceiver’: No process definition or execution matches the parameters

We are facing an MismatchingMessageCorrelationException for the receive task in some cases (less than 5%)
The call back to notify receive task is done by :
protected void respondToCallWorker(
#NonNull final String correlationId,
final CallWorkerResultKeys result,
#Nullable final Map<String, Object> variables
) {
try {
runtimeService.createMessageCorrelation("callWorkerConsumer")
.processInstanceId(correlationId)
.setVariables(variables)
.setVariable("callStatus", result.toString())
.correlateWithResult();
} catch(Exception e) {
e.printStackTrace();
}
}
When i check the logs : i found that the query executed is this one :
select distinct RES.* from ACT_RU_EXECUTION RES
inner join ACT_RE_PROCDEF P on RES.PROC_DEF_ID_ = P.ID_
WHERE RES.PROC_INST_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0' and RES.SUSPENSION_STATE_ = '1'
and exists (select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = RES.ID_ and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer' )
Some times, When i look for the instance of the process in the database i found it waiting in the receive task
SELECT DISTINCT * FROM ACT_RU_EXECUTION RES
WHERE id_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
However, when i check the subscription event, it's not yet created in the database
select ID_ from ACT_RU_EVENT_SUBSCR EVT
where EVT.EXECUTION_ID_ = 'b2362197-3bea-11eb-a150-9e4bf0efd6d0'
and EVT.EVENT_TYPE_ = 'message'
and EVT.EVENT_NAME_ = 'callWorkerConsumer'
I think that the solution is to save the "receive task" before getting the response for respondToCallWorker, but sadly i can't figure it out.
I tried "asynch before" callWorker and "Message consumer" but it did not work,
I also tried camunda.bpm.database.jdbc-batch-processing=false and got the same results,
I tried also parallel branches but i get OptimisticLocak exception and MismatchingMessageCorrelationException
Maybe i am doing it wrong
Thanks for your help
This is an interesting problem. As you already found out, the error happens, when you try to correlate the result from the "worker" before the main process ended its transaction, thus there is no message subscription registered at the time you correlate.
This problem in process orchestration is described and analyzed in this blog post, which is definitely worth reading.
Taken from that post, here is a design that should solve the issue:
You make message send and receive parallel and put an async before the send task.
By doing so, the async continuation job for the send event and the message subscription are written in the same transaction, so when the async message send executes, you already have the subscription waiting.
Although this should work and solve the issue on BPMN model level, it might be worth to consider options that do not require remodeling the process.
First, instead of calling the worker directly from your delegate, you could (assuming you are on spring boot) publish a "CallWorkerCommand" (simple pojo) and use a TransactionalEventLister on a spring bean to execute the actual call. By doing so, you first will finish the BPMN process by subscribing to the message and afterwards, spring will execute your worker call.
Second: you could use a retry mechanism like resilience4j around your correlate message call, so in the rare cases where the result comes to quickly, you fail and retry a second later.
Another solution I could think of, since you seem to be using an "external worker" pattern here, is to use an external-task-service task directly, so the send/receive synchronization gets solved by the Camunda external worker API.
So many options to choose from. I would possibly prefer the external task, followed by the transactionalEventListener, but that is a matter of personal preference.

Akka - force props serialization upon creation

I have Akka cluster where actors can be created remotely and locally.
When actor is created it receives initial state (parameters in the example below).
I am looking for a method to clone the initial state, because when 2 actors are created locally they are working on the same reference and have race condition (when actor created remotely it received serialized copy of an object).
boolean allowLocalRoutees = true;
Set<String> useRoles = Set.of("worker");
ActorRef router = context.actorOf(
new ClusterRouterPool(
new RoundRobinPool(POOL_SIZE),
new ClusterRouterPoolSettings(
numOfWorkers, maxInstancesPerNode, allowLocalRoutees, useRoles))
.props(Props.create(Worker.class, masterRef, parameters
)));
I found 2 solution which are not good:
Use serialize-creators = on which is not recommend for prod and this will serialize other actors that don't share the problematic state
Create a clone of shared state before the actor is initialized:
boolean allowLocalRoutees = true;
Set<String> useRoles = Set.of("worker");
Parameters clonedParameters = parameters.clone();
ActorRef router = context.actorOf(
new ClusterRouterPool(
new RoundRobinPool(POOL_SIZE),
new ClusterRouterPoolSettings(
numOfWorkers, maxInstancesPerNode, allowLocalRoutees, useRoles))
.props(Props.create(Worker.class, masterRef, clonedParameters
)));
In this solution parameters will be cloned even if it is sent remotely so it is not perfect in terms of performance
Is there any other solution I am missing ?

Control number of active actors of a type

Is it possible to control the number of active actors in play? In a nutshell, I have an actor called AuthoriseVisaPaymentActor which handles the message VisaPaymentMessage. I have a parallel loop which sends 10 messages but I am trying to create something which allows for 3 actors to be working simultaneously and the other 7 messages will be blocked and waiting for an actor to be available. Is this possible? I am currently using a RoundRobin setup which I believe I have misunderstood..
var actor = sys.ActorOf(
Props.Create<AuthoriseVisaPaymentActor>().WithRouter(new RoundRobinPool(1)));
actor.Tell(new VisaPaymentMessage(curr.ToString(), 9.99M, "4444"));
To set up a round robin pools/groups, you need to specify the actor paths to use. This can either be done statically in your hocon settings or dynamically in code (see below). As far as messages being blocked, Akka's mailboxes already do that for you; it won't process any new messages until the one it is currently processing has been handled. It just holds them in queue until the actor is ready to handle it.
// Setup the three actors
var actor1 = sys.ActorOf(Props.Create<AuthoriseVisaPaymentActor>());
var actor2 = sys.ActorOf(Props.Create<AuthoriseVisaPaymentActor>());
var actor3 = sys.ActorOf(Props.Create<AuthoriseVisaPaymentActor>());
// Get their paths
var routees = new[] { actor1.Path.ToString(), actor2.Path.ToString(), actor3.Path.ToString() };
// Create a new actor with a router
var router = sys.ActorOf(Props.Empty.WithRouter(new RoundRobinGroup(routees)));
router.Tell(new VisaPaymentMessage(curr.ToString(), 9.99M, "4444"));

Canceling Apache Flink job from the code

I am in a situation where I want to stop/cancel the flink job from the code. This is in my integration test where I am submitting a task to my flink job and check the result. As the job runs, asynchronously, it doesn't stop even when the test fails/passes. I want to job the stop after the test is over.
I tried a few things which I am listing below :
Get the jobmanager actor
Get running jobs
For each running job, send a cancel request to the jobmanager
This, of course in not running but I am not sure whether the jobmanager actorref is wrong or something else is missing.
The error I get is : [flink-akka.actor.default-dispatcher-5] [akka://flink/user/jobmanager_1] Message [org.apache.flink.runtime.messages.JobManagerMessages$RequestRunningJobsStatus$] from Actor[akka://flink/temp/$a] to Actor[akka://flink/user/jobmanager_1] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'
which means either the job manager actor ref is wrong or the message sent to it is incorrect.
The code looks like the following:
val system = ActorSystem("flink", ConfigFactory.load.getConfig("akka")) //I debugged to get this path
val jobManager = system.actorSelection("/user/jobmanager_1") //also got this akka path by debugging and getting the jobmanager akka url
val responseRunningJobs = Patterns.ask(jobManager, JobManagerMessages.getRequestRunningJobsStatus, new FiniteDuration(10000, TimeUnit.MILLISECONDS))
try {
val result = Await.result(responseRunningJobs, new FiniteDuration(5000, TimeUnit.MILLISECONDS))
if(result.isInstanceOf[RunningJobsStatus]){
val runningJobs = result.asInstanceOf[RunningJobsStatus].getStatusMessages()
val itr = runningJobs.iterator()
while(itr.hasNext){
val jobId = itr.next().getJobId
val killResponse = Patterns.ask(jobManager, new CancelJob(jobId), new Timeout(new FiniteDuration(2000, TimeUnit.MILLISECONDS)));
try {
Await.result(killResponse, new FiniteDuration(2000, TimeUnit.MILLISECONDS))
}
catch {
case e : Exception =>"Canceling the job with ID " + jobId + " failed." + e
}
}
}
}
catch{
case e : Exception => "Could not retrieve running jobs from the JobManager." + e
}
}
Can someone check if this is the correct approach ?
EDIT :
To completely stop the job, it is necessary to stop the TaskManager along with the JobManager in the order TaskManager first and then JobManager.
You're creating a new ActorSystem and then try to find an actor with the name /user/jobmanager_1 in the same actor system. This won't work, since the actual job manager will run in a different ActorSystem.
If you want to obtain an ActorRef to the real job manager, you either have to use the same ActorSystem for the selection (then you can use a local address) or you have find out the remote address for the job manager actor. The remote address has the format akka.tcp://flink#[address_of_actor_system]/user/jobmanager_[instance_number]. If you have access to the FlinkMiniCluster then you can use the leaderGateway promise to obtain the current leader's ActorGateway.

Why are my requests handled by a single thread in spray-http?

I set up an http server using spray-can, spray-http 1.3.2 and akka 2.3.6.
my application.conf doesn't have any akka (or spray) entries. My actor code:
class TestActor extends HttpServiceActor with ActorLogging with PlayJsonSupport {
val route = get {
path("clientapi"/"orders") {
complete {{
log.info("handling request")
System.err.println("sleeping "+Thread.currentThread().getName)
Thread.sleep(1000)
System.err.println("woke up "+Thread.currentThread().getName)
Seq[Int]()
}}
}
}
override def receive: Receive = runRoute(route)
}
started like this:
val restService = system.actorOf(Props(classOf[TestActor]), "rest-clientapi")
IO(Http) ! Http.Bind(restService, serviceHost, servicePort)
When I send 10 concurrent requests, they are all accepted immediately by spray and forwarded to different dispatcher actors (according to logging config for akka I have removed from applicaiton.conf lest it influenced the result), but all are handled by the same thread, which sleeps, and only after waking up picks up the next request.
What should I add/change in the configuration? From what I've seen in reference.conf the default executor is a fork-join-executor, so I'd expect all the requests to execute in parallel out of the box.
From your code I see that there is only one TestActor to handle all requests, as you've created only one with system.actorOf. You know, actorOf doesn't create new actor per request - more than that, you have the val there, so it's only one actor. This actor handles requests sequntially one-by-one and your routes are processing inside this actor. There is no reason for dispatcher to pick-up some another thread, while the only one thread per time is used by only one actor, so you've got only one thread in the logs (but it's not guaranteed) - I assume it's first thread in the pool.
Fork-join executor does nothing here except giving first and always same free thread as there is no more actors requiring threads in parallel with current one. So, it receives only one task at time. Even with "work stealing" - it doesn't work til you have some blocked (and marked to have managed block) thread to "steal" resources from. Thread.sleep(1000) itself doesn't mark thread automatically - you should surround it with scala.concurrent.blocking to use "work stealing". Anyway, it still be only one thread while you have only one actor.
If you need to have several actors to process the requests - just pass some akka router actor (it has nothing in common with spray-router):
val restService = context.actorOf(RoundRobinPool(5).props(Props[TestActor]), "router")
That will create a pool (not thread-pool) with 5 actors to serve your requests.