Do Google Cloud Tasks Delete themselves once they have executed? - google-cloud-platform

In my application, I have implemented Google Tasks so that my users can receive notifications on when their ToDo item is due.
My main issue is that when my Cloud Task fires, I noticed that I still see it located in my Cloud Task Console. So, do they delete themselves once they are fired? For my application, I want the cloud tasks to delete themselves once they are done.
I noticed in the documentation this line you can also fine-tune the configuration for the task, like scheduling a time in the future when it should be executed or limiting the number of times you want the task to be retried if it fails. The thing is, my task is not failing and yet I see the number of retries at 4.
firebase cloud functions
exports.firestoreTtlCallback = functions.https.onRequest(async (req, res) => {
try {
const payload = req.body;
let entry = await (await admin.firestore().doc(payload.docPath).get()).data();
let tokens = await (await admin.firestore().doc(`/users/${payload.uid}`).get()).get('tokens')
await admin.messaging().sendMulticast({
tokens,
notification: {
title: "App",
body: entry['text']
}
}).then((response) => {
log('Successfully sent message:')
log(response)
}).catch((error) => {
log('Error in sending Message')
log(error)
})
const taskClient = new CloudTasksClient();
let { expirationTask } = admin.firestore().doc(payload.docPath).get()
await taskClient.deleteTask({ name: expirationTask })
await admin.firestore().doc(payload.docPath).update({ expirationTask: admin.firestore.FieldValue.delete() })
res.status(200)
} catch (err) {
log(err)
res.status(500).send(err)
}
})

A task can be deleted if it is scheduled or dispatched. A task cannot be deleted if it has completed successfully or permanently failed according to this documentation.
The task attempt has succeeded if the app's request handler returns
an HTTP response code in the range [200 - 299].
The task attempt has failed if the app's handler returns a non-2xx
response code or Cloud Tasks does not receive response before the
deadline which is :
For HTTP tasks, 10 minutes. The deadline must be in the interval [15
seconds, 30 minutes]
For App Engine tasks, 0 indicates that the request has the default
deadline. The default deadline depends on the scaling type of
the service: 10 minutes for standard apps with automatic scaling, 24
hours for standard apps with manual and basic scaling, and 60
minutes for flex apps.
Failed tasks will be retried according to the retry
configuration. Please check your queue.yaml file for the retry
configuration set and if you want to specify and set them as per your
choice follow this.
The task will be pushed to the worker as an HTTP request. If the worker or the redirected worker acknowledges the task by returning a successful HTTP response code ([200 - 299]), the task will be removed from the queue as per this documentation. If any other HTTP response code is returned or no response is received, the task will be retried according to the following:
User-specified throttling: retry configuration, rate limits, and
the queue's state.
System throttling: To prevent the worker from overloading, Cloud
Tasks may temporarily reduce the queue's effective rate.
User-specified settings will not be changed.

Related

Google GCP cloud run redis client loses connection to the instance

I am running my nodejs application on google cloudrun. My application connects to google memorystore redis. Every few mins am getting the following error
Error: read Connection Reset
Followed by
AbortError: Redis connection lost and command aborted. It might have been processed.
Please help what am I missing?
My nodejs code
const redis = require('redis')
const redisClient = redis.createClient({host:'xxx', port: 6379})
redisClient.on('error, function (err) {
console.log(err)
}
const data = await redisClient.getExAsync('key')
Use "setInterval" function in order to invoke Redis operation every minute.
async function RedisKA() {
client.get("key2", (err, reply) => {
console.log(`${kaCount} redis keep `);
});
}
let updateIntervalId = setInterval(RedisKA, 60000);
If you want to avoid the request timeout on the Cloud Run side, which is 5 minutes by default then set your value based on your requirement.
The issue may be caused due a socket time out. This is expected to happen when there is no activity for a period of time.
This could be avoided by periodically executing any command on the connection, for example one command per minute, so it will keep the socket alive and will not abort the connection.

Firebase function connection with GCP Redis instance in the same VPC keeps on disconnecting

I am working on multiple Firebase cloud functions (all hosted in the same region) that connect with a GCP hosted Redis instance in the same region, using a VPC connector. I am using version 3.0.2 of the nodejs library for Redis. In the cloud functions' debug logs, I am seeing frequent connection reset logs, triggered for each cloud function with no fixed pattern around the timeline for the connection reset. And each time, the error captured in the error event handler is ECONNRESET. While creating the Redis instance, I have provided a retry_strategy to reconnect after 5 ms with maximum of 10 such attempts, along with the retry_unfulfilled_commands set to true, expecting that any unfulfilled command at the time of connection reset will be automatically retried (refer the code below).
const redisLib = require('redis');
const client = redisLib.createClient(REDIS_PORT, REDIS_HOST, {
enable_offline_queue: true,
retry_unfulfilled_commands: true,
retry_strategy: function(options) {
if (options.error && options.error.code === "ECONNREFUSED") {
// End reconnecting on a specific error and flush all commands with
// a individual error
return new Error("The server refused the connection");
}
if (options.attempt > REDIS_CONNECTION_RETRY_ATTEMPTS) {
// End reconnecting with built in error
console.log('Connection retry count exceeded 10');
return undefined;
}
// reconnect after 5 ms
console.log('Retrying connection after 5 ms');
return 5;
},
});
client.on('connect', () => {
console.log('Redis instance connected');
});
client.on('error', (err) => {
console.error(`Error connecting to Redis instance - ${err}`);
});
exports.getUserDataForId = (userId) => {
console.log('getUserDataForId invoked');
return new Promise((resolve, reject) => {
if(!client.connected) {
console.log('Redis instance not yet connected');
}
client.get(userId, (err, reply) => {
if(err) {
console.error(JSON.stringify(err));
reject(err);
} else {
resolve(reply);
}
});
});
}
// more such exports for different operations
Following are the questions / issues I am facing.
Why is the connection getting reset intermittently?
I have seen logs that even if the cloud function is being executed, the connection to Redis server lost resulting in failure of the command.
With retry_unfulfilled_commands set to true, I hoped it will handle the scenario as mentioned in point number 2 above, but as per debug logs, the cloud function times out in such scenario. This is what I observed in the logs in that case.
getUserDataForId invoked
Retrying connection after 5 ms
Redis instance connected
Function execution took 60002 ms, finished with status: 'timeout' --> coming from wrapper cloud function
Should I, instead of having a Redis connection instance at global level, try to have a connection created during each such Redis operation? It might have some performance issues as well as issues around number of concurrent Redis connections (since I have multiple cloud functions and all those will be creating Redis connections for each simultaneous invocation), right?
So, how to best handle it since I am facing all these issues during development itself, so not really sure if it's code related issue or some infrastructure configuration related issue.
This behavior could be caused by background activities.
"Background activity is anything that happens after your function has
terminated"
When the background activity interferes with subsequent invocations in Cloud Functions, unexpected behavior and errors that are hard to diagnose may occur. Accessing the network after a function terminates usually leads to "ECONNRESET" errors.
To troubleshoot this, make sure that there is no background activity by searching the logs for entries after the line saying that the invocation finished. Background activity can sometimes be buried deeper in the code, especially when asynchronous operations such as callbacks or timers are present. Review your code to make sure all asynchronous operations finish before you terminate the function.
Source

Avoid webjob waiting when ingest small batch of data to azure data explorer

I have a webjob receives site click events from azure event hub, then ingest those events into ADX.
public static async Task Run([EventHubTrigger] EventData[] events, ILogger logger)
{
// Process events
try
{
var ingestResult = await _adxIngester.IngestAsync(events);
if (!ingestResult)
{
AppInsightLogError();
logger.LogError();
}
}
catch(Exception ex)
{
AppInsighLogError();
logger.LogError()
}
}
I've used queue ingestion and turned off FlushImmediately when ingesting to ADX, which enable batch ingestion. When events does not meet default IngestionBatch policy of 1000 events / 1GB data size, ADX waits 5 minutes until it return Success status, which makes Run also waits for that amount of time.
public async Task<bool> IngestAsync(...)
{
IKustoQueuedIngestClient client = KustoIngestFactory.CreateQueuedIngestClient(kustoConnectionString);
var kustoIngestionProperties = new KustoQueuedIngestionProperties(databaseName: "myDB", tableName: "events")
{
ReportLevel = IngestionReportLevel.FailuresOnly,
ReportMethod = IngestionReportMethod.Table,
FlushImmediately = false
};
var streamIdentifier = Guid.NewGuid();
var clientResult = await client.IngestFromStreamAsync(...);
var ingestionStatus = clientResult.GetIngestionStatusBySourceId(streamIdentifier);
while (ingestionStatus.Status == Status.Pending)
{
await Task.Delay(TimeSpan.FromSeconds(15));
ingestionStatus = clientResult.GetIngestionStatusBySourceId(streamIdentifier);
}
if (ingestionStatus.Status == Status.Failed)
{
return false;
}
return true;
}
Since I don't want my webjob to wait that long when there are not many events coming in, or simply QA is at work, I made the following changes:
Don't await on IngestAsync, thus make Run a synchronous method
Add parameter Action onError to IngestAsync and call it when ingest task fails. Call AppInsightLogError() and logger.LogError() inside onError, instead of return false
Replace IngestFromStreamAsync with IngestFromStream
Basically, I want to ensure events reaches Azure Queue and throws exception (if any) before I poll for ingest status, then exit Run method, and I don't have to wait for status polling, if anything fails then it will be log.
My question is:
Is it a good practice to avoid webjob waits for minutes? If no, why ?
If yes, is my solution good enough for this problem? Otherwise how
should I do this?
If you are ingesting small batches of data and wish to cut down on the ingestion batching times, please read the following article: https://learn.microsoft.com/en-us/azure/kusto/concepts/batchingpolicy
Ingestion Batching policy allows you to control the batching limits per database or table.
The ingestion is performed in few phases. One phase is done at the client side, and one phase is done at the server side:
The ingest client code you’re using is going to take your stream and upload it to a blob, and then it will send a message to a queue.
Any exceptions thrown during that phase, will indeed be propagated to your code, which is why you should also use some try-catch block, where in the catch block you can log the error message as you suggested.
You can either use IngestFromStreamAsync with the await keyword, or use IngestFromStream. The first option is better if you’d like to release the worker thread and save resources. But choosing between those two doesn’t have anything to do with the polling. The polling is relevant to the second phase.
Kusto’s DataManagement component is constantly listening to messages in the queue, so as soon as it gets to your new message, it will read it and see some metadata information about the new ingestion request, such as the blob URI where the data is stored and such as the Azure table where failures/progress should be updated.
That phase is done remotely by the server side, and you have an option to wait in your client code for each single ingestion and poll until the server completes the ingestion process. If there are any exceptions during that phase, then of course they won’t be propagated to your client code, but rather you’ll be able to examine the Azure table and see what happened.
You can also decide to defer that status examination, and have it done in some other task.
IngestFromStreamAsync upload your data to a blob and post a message to the Data Management input queue.It will not wait for aggregation time and the final state you will get is Queued.
FlushImmediately defaults to false.
If there isn't any additional processing, consider using the Event Hub to Kusto connection
[Edited] responding to comments:
Queued state indicate the blob is pending ingestion. You can track status by show ingestion failures command, metrics and ingestion logs.
Event hub connection goes through queued ingestion by default. It will use streaming ingestion only if it is set as policy on the database / table
Some of the processing can be done on ADX, using ingestion mapping and update policy.

Google Cloud Tasks HTTP trigger - how to disable retry

I'm trying to create a Cloud Tasks queue that never retries if an HTTP task fails.
According to the documentation, maxAttempts should be what I'm looking for:
Number of attempts per task.
Cloud Tasks will attempt the task maxAttempts times (that is, if the
first attempt fails, then there will be maxAttempts - 1 retries). Must
be >= -1.
So, if maxAttempts is 1, there should be 0 retries.
But, for example, if I run
gcloud tasks queues create test-queue --max-attempts=1 --log-sampling-ratio=1.0
then use the following Python code to create an HTTP task:
from google.cloud import tasks_v2beta3
from google.protobuf import timestamp_pb2
client = tasks_v2beta3.CloudTasksClient()
project = 'project_id' # replace by real project ID
queue = 'test-queue'
location = 'us-central1'
url = 'https://example.com/task_handler' # replace by some endpoint that return 5xx status code
parent = client.queue_path(project, location, queue)
task = {
'http_request': { # Specify the type of request.
'http_method': 'POST',
'url': url # The full url path that the task will be sent to.
}
}
response = client.create_task(parent, task)
print('Created task {}'.format(response.name))
In the Stackdriver logs for the queue (which I can see because I used --log-sampling-ratio=1.0 when creating the queue), the task is apparently retried once: there is one dispatch attempt, followed by a dispatch response with status UNAVAILABLE, followed by another dispatch attempt, which is finally followed by the last dispatch response (also indicating UNAVAILABLE).
Is there any way to retry 0 times?
Note
About maxAttempts, the documentation also says:
This field has the same meaning as task_retry_limit in queue.yaml/xml.
However, when I go to the description for task_retry_limit, it says:
The number of retries. For example, if 0 is specified and the task
fails, the task is not retried at all. If 1 is specified and the task
fails, the task is retried once. If this parameter is unspecified, the
task is retried indefinitely. If task_retry_limit is specified with
task_age_limit, the task is retried until both limits are reached.
This seems to be inconsistent with the description of maxAttempts, as it indicates that the task would be retried once if the parameter is 1.
I've experimented with setting maxAttempts to 0, but that seems to make it assume a default value of 100.
Thank you in advance.
As #averi-kitsch mentioned, this is currently an internal issue which our Cloud Tasks engineer team is working on right now, sadly we don't have any ETA yet.
You can follow the progress of this issue with this Public Issue Tracker, click on the "star" to subscribe to it and receive future updates.
As a work around, if you don't want the task to retry after it fails, set "task_retry_limit=0" directly on the queue.yaml.
Example :
queue:
- name: my-queue1
rate: 1/s
retry_parameters:
task_retry_limit: 0

Using Cloud Run on a PubSub topic

It was not clear to me how to use Cloud Run on a PubSub topic for for medium-run tasks (inside of the time limit of Cloud Run, of course.)
Let's see this example taken from the tutorials[1]:
app.post('/', (req, res) => {
if (!req.body) {
const msg = 'no Pub/Sub message received'
console.error(`error: ${msg}`)
res.status(400).send(`Bad Request: ${msg}`)
return
}
if (!req.body.message) {
const msg = 'invalid Pub/Sub message format'
console.error(`error: ${msg}`)
res.status(400).send(`Bad Request: ${msg}`)
return
}
const pubSubMessage = req.body.message
const name = pubSubMessage.data
? Buffer.from(pubSubMessage.data, 'base64').toString().trim()
: 'World'
console.log(`Hello ${name}!`)
res.status(204).send()
})
My doubt is: Should it return HTTP 204 only after the task finishes, otherwise the task will terminated sudden?
1 - https://cloud.google.com/run/docs/tutorials/pubsub
My doubt is: Should it return HTTP 204 only after the task finishes,
otherwise the task will terminated sudden?
You do not have a choice. If you return before your task/objective finishes, the CPU will be idled to zero and nothing will happen in your Cloud Run instance.
In your example, you are just processing a pub/sub message and extracting the name. If you return before this is finished, no name will be processed.
Cloud Run is designed for an HTTP Request/Response system. This means processing begins when you receive an HTTP Request (GET, POST, PUT, etc.) and ends when your code returns an HTTP Response (or just returns with no response). You might try to create background threads but there is no guarantee that they will execute once your main function returns.