DeadlineExceededException when creating tasks on startup - google-cloud-platform

I have a Spring Boot 2.4.5 application deployed on Google Cloud Run (image created with Jib). On startup I want to create a Cloud Task but I get a DeadlineExceededException.
If I run the task creation code but triggered by an HTTP request, the task is created. And the task that was supposed to be created on startup is also created. It's like something is missing at the startup that prevents task to be created.
The startup event
#EventListener(ApplicationReadyEvent.class)
public void doSomethingAfterStartup() {
LOGGER.info("ApplicationReadyEvent");
String message = "GCP New Instance Start " + Instant.now();
cloudTasksService.createTask("xxxx", "us-central1", "xxxx", message, 60);
}
The task creation code
public void createTask(String projectId, String locationId, String queueId, String message, Integer delay) throws IOException {
try (CloudTasksClient client = CloudTasksClient.create()) {
LOGGER.info("Client created");
String url = "xxxxxxxxx";
String payload = String.format("{ \"text\": \"%s\"}", message);
String queuePath = QueueName.of(projectId, locationId, queueId).toString();
Instant eta = Instant.now().plusSeconds(delay);
Task.Builder taskBuilder =
Task.newBuilder()
.setScheduleTime(Timestamp.newBuilder().setSeconds(eta.getEpochSecond()).build())
.setHttpRequest(
HttpRequest.newBuilder()
.setBody(ByteString.copyFrom(payload, Charset.defaultCharset()))
.setUrl(url)
.setHttpMethod(HttpMethod.POST)
.build());
LOGGER.info("TaskBuilder ready");
Task task = client.createTask(queuePath, taskBuilder.build());
LOGGER.info("Task created: {}", task.getName());
}
}
The HTTP endpoint
#GetMapping("/tasks")
public ResponseEntity<Void> task(#RequestParam Integer delay) throws IOException {
cloudTasksService.createTask("xxxx", "us-central1", "xxxx", "using HTTP request", delay);
return ResponseEntity.accepted().build();
}
The exception
com.google.api.gax.rpc.DeadlineExceededException: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline exceeded after 5.200272920s.
at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:51)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1074)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1213)
at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:724)
at com.google.common.util.concurrent.ForwardingListenableFuture.addListener(ForwardingListenableFuture.java:45)
at com.google.api.core.ApiFutureToListenableFuture.addListener(ApiFutureToListenableFuture.java:52)
at com.google.common.util.concurrent.Futures.addCallback(Futures.java:1047)
at com.google.api.core.ApiFutures.addCallback(ApiFutures.java:63)
at com.google.api.gax.grpc.GrpcExceptionCallable.futureCall(GrpcExceptionCallable.java:67)
at com.google.api.gax.rpc.UnaryCallable$1.futureCall(UnaryCallable.java:126)
at com.google.api.gax.tracing.TracedUnaryCallable.futureCall(TracedUnaryCallable.java:75)
at com.google.api.gax.rpc.UnaryCallable$1.futureCall(UnaryCallable.java:126)
at com.google.api.gax.rpc.UnaryCallable.futureCall(UnaryCallable.java:87)
at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112)
at com.google.cloud.tasks.v2.CloudTasksClient.createTask(CloudTasksClient.java:1915)
at com.google.cloud.tasks.v2.CloudTasksClient.createTask(CloudTasksClient.java:1885)
at com.sps.playground.CloudTasksService.createTask(CloudTasksService.java:55)

It looks like the workers are not ready when the task is being queued. I'd recommend creating tasks on startup as this can happen often due to workers not passing the readiness check when the task is being processed, and then it fails. That wold also explain why the task runs normally when triggered by HTTP.
You can tackle this by decreasing the time of your startup following the recommendations. Also, as you are using Java with Springboot, it may be worth checking Reducing startup tasks recommendations as well.

Related

GCP Cloud Tasks: shorten period for creating a previously created named task

We are developing a GCP Cloud Task based queue process that sends a status email whenever a particular Firestore doc write-trigger fires. The reason we use Cloud Tasks is so a delay can be created (using scheduledTime property 2-min in the future) before the email is sent, and to control dedup (by using a task-name formatted as: [firestore-collection-name]-[doc-id]) since the 'write' trigger on the Firestore doc can be fired several times as the document is being created and then quickly updated by backend cloud functions.
Once the task's delay period has been reached, the cloud-task runs, and the email is sent with updated Firestore document info included. After which the task is deleted from the queue and all is good.
Except:
If the user updates the Firestore doc (say 20 or 30 min later) we want to resend the status email but are unable to create the task using the same task-name. We get the following error:
409 The task cannot be created because a task with this name existed too recently. For more information about task de-duplication see https://cloud.google.com/tasks/docs/reference/rest/v2/projects.locations.queues.tasks/create#body.request_body.FIELDS.task.
This was unexpected as the queue is empty at this point as the last task completed succesfully. The documentation referenced in the error message says:
If the task's queue was created using Cloud Tasks, then another task
with the same name can't be created for ~1hour after the original task
was deleted or executed.
Question: is there some way in which this restriction can be by-passed by lowering the amount of time, or even removing the restriction all together?
The short answer is No. As you've already pointed, the docs are very clear regarding this behavior and you should wait 1 hour to create a task with same name as one that was previously created. The API or Client Libraries does not allow to decrease this time.
Having said that, I would suggest that instead of using the same Task ID, use different ones for the task and add an identifier in the body of the request. For example, using Python:
from google.cloud import tasks_v2
from google.protobuf import timestamp_pb2
import datetime
def create_task(project, queue, location, payload=None, in_seconds=None):
client = tasks_v2.CloudTasksClient()
parent = client.queue_path(project, location, queue)
task = {
'app_engine_http_request': {
'http_method': 'POST',
'relative_uri': '/task/'+queue
}
}
if payload is not None:
converted_payload = payload.encode()
task['app_engine_http_request']['body'] = converted_payload
if in_seconds is not None:
d = datetime.datetime.utcnow() + datetime.timedelta(seconds=in_seconds)
timestamp = timestamp_pb2.Timestamp()
timestamp.FromDatetime(d)
task['schedule_time'] = timestamp
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
print(response)
#You can change DOCUMENT_ID with USER_ID or something to identify the task
create_task(PROJECT_ID, QUEUE, REGION, DOCUMENT_ID)
Facing a similar problem of requiring to debounce multiple instances of Firestore write-trigger functions, we worked around the default Cloud Tasks task-name based dedup mechanism (still a constraint in Nov 2022) by building a small debounce "helper" using Firestore transactions.
We're using a helper collection _syncHelper_ to implement a delayed throttle for side effects of write-trigger fires - in the OP's case, send 1 email for all writes within 2 minutes.
In our case we are using Firebease Functions task queue utils and not directly interacting with Cloud Tasks but thats immaterial to the solution. The key is to determine the task's execution time in advance and use that as the "dedup key":
async function enqueueTask(shopId) {
const queueName = 'doSomething';
const now = new Date();
const next = new Date(now.getTime() + 2 * 60 * 1000);
try {
const shouldEnqueue = await getFirestore().runTransaction(async t=>{
const syncRef = getFirestore().collection('_syncHelper_').doc(<collection_id-doc_id>);
const doc = await t.get(syncRef);
let data = doc.data();
if (data?.timestamp.toDate()> now) {
return false;
}
await t.set(syncRef, { timestamp: Timestamp.fromDate(next) });
return true;
});
if (shouldEnqueue) {
let queue = getFunctions().taskQueue(queueName);
await queue.enqueue({
timestamp: next.toISOString(),
},
{ scheduleTime: next }); }
} catch {
...
}
}
This will ensure a new task is enqueued only if the "next execution" time has passed.
The execution operation (also a cloud function in our case) will remove the sync data entry if it hasn't been changed since it was executed:
exports.doSomething = functions.tasks.taskQueue({
retryConfig: {
maxAttempts: 2,
minBackoffSeconds: 60,
},
rateLimits: {
maxConcurrentDispatches: 2,
}
}).onDispatch(async data => {
let { timestamp } = data;
await sendYourEmailHere();
await getFirestore().runTransaction(async t => {
const syncRef = getFirestore().collection('_syncHelper_').doc(<collection_id-doc_id>);
const doc = await t.get(syncRef);
let data = doc.data();
if (data?.timestamp.toDate() <= new Date(timestamp)) {
await t.delete(syncRef);
}
});
});
This isn't a bullet proof solution (if the doSomething() execution function has high latency for example) but good enough for 99% of our use cases.

Get tasks status in AWS Step Functions (boto3)

I am currently using boto3 (the Amazon Web Services (AWS) SDK for Python) to create state machines, start executions and also in my workers to retrieve tasks and report their status (completed successfully or failed).
I have another service that needs to know the tasks' status and I would like to do so by retrieving it from AWS. I searched the available methods and it is only possible to get the status of a state machine/execution as a whole (RUNNING|SUCCEEDED|FAILED|TIMED_OUT|ABORTED).
There is also the get_execution_history method but each step is identified by an id numbered sequentially and there is no information about the task itself (only in the "stateEnteredEventDetails" event, where the name of the task is present, but the subsequentially events may not be related to it, so it is impossible to know if the task was successful or not).
Is it really not possible to retrieve the status of a specific task, or am I missing something?
Thank you!
I had the same problem, and it seems that step functions does not consider the states and tasks as entities, and therefore there is not an API to get info about them.
In order to get info about the task's status you need to parse the information in the execution history. In my case I first check the execution status:
import boto3
import json
client = boto3.client("stepfunctions")
response = client.describe_execution(
executionArn=EXECUTION_ARN
)
status = response["status"]
and if it is "FAILED" then I analyze the history and get the most relevant fields for my use case (for events of type "TaskFailed"):
response = client.get_execution_history(
executionArn=EXECUTION_ARN,
maxResults=1000
)
events = response["events"]
while response.get("nextToken"):
response = client.get_execution_history(
executionArn=EXECUTION_ARN,
maxResults=1000,
nextToken=response["nextToken"]
)
events += response["events"]
causes = [
json.loads(e["taskFailedEventDetails"]["cause"])
for e in events
if e["type"] == "TaskFailed"
]
return [
{
"ClusterArn": cause["ClusterArn"],
"Containers": [
{
"ContainerArn": container["ContainerArn"],
"Name": container["Name"],
"ExitCode": container["ExitCode"],
"Overrides": cause["Overrides"]["ContainerOverrides"][i]
}
for i, container in enumerate(cause["Containers"])
],
"TaskArn": cause["TaskArn"],
"StoppedReason": cause["StoppedReason"]
}
for cause in causes
]

AWS Java SDK - Running a command using SSM on EC2 instances

I could not find any examples of this online, nor could I find the documentation explaining how to do this. Basically I have a list of Windows EC2 instances and I need to run the quser command in each one of them to check how many users are logged on.
It is possible to do this using the AWS Systems Manager service and running the AWS-RunPowerShellScript command. I only found examples using the AWS CLI, something like this:
aws ssm send-command --instance-ids "instance ID" --document-name "AWS-RunPowerShellScript" --comment "Get Users" --parameters commands=quser --output text
But how can I accomplish this using the AWS Java SDK 1.11.x ?
#Alexandre Krabbe it's more than a year before you asked this question. So not sure the answer will help you. But I was trying to do the same recently and that led me to this unanswered question. I ended up solving the problem and thought my answer could help other people facing the same problem. Here is a code snippet for the same:
public void runCommand() throws InterruptedException {
//Command to be run
String ssmCommand = "ls -l";
Map<String, List<String>> params = new HashMap<String, List<String>>(){{
put("commands", new ArrayList<String>(){{ add(ssmCommand); }});
}};
int timeoutInSecs = 5;
//You can add multiple command ids separated by commas
Target target = new Target().withKey("InstanceIds").withValues("instance-id");
//Create ssm client.
//The builder could be chosen as per your preferred way of authentication
//use withRegion for specifying your region
AWSSimpleSystemsManagement ssm = AWSSimpleSystemsManagementClientBuilder.standard().build();
//Build a send command request
SendCommandRequest commandRequest = new SendCommandRequest()
.withTargets(target)
.withDocumentName("AWS-RunShellScript")
.withParameters(params);
//The result has commandId which is used to track the execution further
SendCommandResult commandResult = ssm.sendCommand(commandRequest);
String commandId = commandResult.getCommand().getCommandId();
//Loop until the invocation ends
String status;
do {
ListCommandInvocationsRequest request = new ListCommandInvocationsRequest()
.withCommandId(commandId)
.withDetails(true);
//You get one invocation per ec2 instance that you added to target
//For just a single instance use get(0) else loop over the instanced
CommandInvocation invocation = ssm.listCommandInvocations(request).getCommandInvocations().get(0);
status = invocation.getStatus();
if(status.equals("Success")) {
//command output holds the output of running the command
//eg. list of directories in case of ls
String commandOutput = invocation.getCommandPlugins().get(0).getOutput();
//Process the output
}
//Wait for a few seconds before you check the invocation status again
try {
TimeUnit.SECONDS.sleep(timeoutInSecs);
} catch (InterruptedException e) {
//Handle not being able to sleep
}
} while(status.equals("Pending") || status.equals("InProgress"));
if(!status.equals("Success")) {
//Command ended up in a failure
}
}
In SDK 1.11.x , use sth like:
val waiter = ssmClient.waiters().commandExecuted()
waiter.run(WaiterParameters(GetCommandInvocationRequest()
.withCommandId(commandId)
.withInstanceId(instanceId)
))

How to connect the agent in the Amazon Connect in outbound call

I have a simple contact flow like below from which I trigger the call from Amazon Connect (claimed phone number in AWS Connect) to the end customer (real customer phone number):
Now I want to connect an agent in the Amazon Connect end.
When I trigger the following code, I need to trigger the call from the Amazon Connect (Customer Agent) to the end customer (Real customer phone number)
const AWS = require('aws-sdk');
AWS.config.update({ region: 'us-east-1' });
exports.handler = (event, context, callback) => {
let connect = new AWS.Connect();
const customerName = event.name;
const customerPhoneNumber = event.number;
const dayOfWeek = event.day;
let params = {
"InstanceId" : '12345l-abcd-1234-abcde-123456789bcde',
"ContactFlowId" : '987654-lkjhgf-9875-abcde-poiuyt0987645',
"SourcePhoneNumber" : '+1123456789',
"DestinationPhoneNumber" : customerPhoneNumber,
"Attributes" : {
'name' : customerName,
'dayOfWeek' : dayOfWeek
}
}
connect.startOutboundVoiceContact(
params, function (error, response){
if(error) {
console.log(error)
callback("Error", null);
} else
{
console.log('Initiated an outbound call with Contact Id ' + JSON.stringify(response.ContactId));
callback(null, 'Success');
}
}
);
};
How to add the customer agent in the contact flow?
Logging is not working (Not able to find any logs in CloudWatch AWS)
Is my call recording added in the right section in contact flow?
To connect the call to an agent, you need to add a “set working queue” block to set the call to route to a queue where you have available agents. After you set your queue, replace the “disconnect / hang up” block with a “transfer to queue” block. This will route the call to an available agent or queue the call if no agent is immediately available.
Recording will only occur for the portion of the call between the agent and the outside party, so you won’t see any recordings for calls that didn’t get connected to an agent. Since you have the “set recording behavior” block set to “customer and agent” in your flow already, you should get a recording file when the call gets connected to an agent with the steps above.

ClusterReceptionistExtension doesn't register Subscriber

I am trying to use akka pub-sub with in our application. I have a play application which is part of akka cluster. I want to use akka cluster-client to make make this application listen/subscribe to topics and messages will be published from other applications.
Cluster/Subscriber side code [within Play application]
class MyRealtimeActor extends Actor {
import DistributedPubSubMediator.{ Subscribe, SubscribeAck }
def receive = {
case SubscribeAck(Subscribe("metrics", _)) => {
Logger.info("SUBSCRIBED TO MESSAGES")
context become ready
}
}
def ready: Actor.Receive = {
case m => {
Logger.info("RECEIVED MESSAGE " + m)
}
}
}
and I instantiate like this in Global
val cluster: ActorSystem = ActorSystem("ClusterSystem")
val metricsActor = Global.cluster.actorOf(Props(new MyRealtimeActor), "metricsActor")
ClusterReceptionistExtension(cluster).registerSubscriber("metrics", metricsActor)
and the conf file has the following
akka {
actor {
provider = "akka.cluster.ClusterActorRefProvider"
extensions = ["akka.contrib.pattern.DistributedPubSubExtension",
"akka.contrib.pattern.ClusterReceptionistExtension"]
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 2551
}
}
cluster {
seed-nodes = [
"akka.tcp://ClusterSystem#127.0.0.1:2551"
]
auto-down-unreachable-after = 10s
}
When is start the play application i can see the following log
[INFO] [11/06/2013 17:48:42.926] [ClusterSystem-akka.actor.default-dispatcher-3] [Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem#127.0.0.1:2551] - Node [akka.tcp://ClusterSystem#127.0.0.1:2551] is JOINING, roles []
[INFO] [11/06/2013 17:48:42.942] [ClusterSystem-akka.actor.default-dispatcher-5] [akka://ClusterSystem/deadLetters] Message [akka.contrib.pattern.DistributedPubSubMediator$SubscribeAck] from Actor[akka://ClusterSystem/user/distributedPubSubMediator#1608017981] to Actor[akka://ClusterSystem/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
Would like to know why the actor is not properly subscribed ? I am expecting it to print SUBSCRIBED TO MESSAGES
The thing is that the SubscribeAck is sent to the sender of the Subscribe message and not the actor in the Subscribe message. To get the SubscribeAck sent to the metricsActor, it would have to send the Subscribe itself, and directly to the mediator.
The receptionist is used by the cluster client code, and you shouldn’t use that to subscribe your actors normally.