How to retrieve the current state of a running Step Functions in AWS - amazon-web-services

I'm giving the AWS' Step Functions a try and I'm interested in them for implementing long-running procedures. One functionality I would like to provide to my users is the possibility of showing execution's progress. Using describeExecution I can verify if some execution is still running or done. But progress is a logical measure and Step Functions itself has no way to tell me how much of the process is left.
For that, I need to provide the logic myself. I can measure the progress in the tasks of the state machine knowing the total number of steps needed to take and counting the number of steps already taken. I can store this information in the state of the machine which is passed among steps while the machine is running. But how can I extract this state using API? Of course, I can store this information is an external storage like DynamoDb but that's not very elegant!

The solution I have found my self (so far this is the only), is using getExecutionHistory API. This API returned a list of events that are generated for the Step Functions and it can include input or output (or neither) based on whether the event is for a starting a lambda function or is it for the time a lambda function has exited. You can call the API like this:
var params = {
executionArn: 'STRING_VALUE', /* required */
maxResults: 10,
reverseOrder: true
};
stepfunctions.getExecutionHistory(params, function(err, data) {
if (err) console.log(err, err.stack); // an error occurred
else console.log(data); // successful response
});
By reversing the order of the list of events, we can get the latest ones first. Then we can look for the latest output in the list. The first one you'll find will be the latest version of the output which is the current state of the Step Functions.

Related

how can I prevent dynamoDB Stream handler from infinitely processing a record when I use batchItemFailures

I have a dynamoDB stream which is triggering a lambda handler that looks like this:
let failedRequestId: string
await asyncForEachSerial(event.Records, async (record) => {
try {
await handle(record.dynamodb.OldImage, record.dynamodb.NewImage, record, context)
return true
} catch (e) {
failedRequestId = record.dynamodb.SequenceNumber
}
return false //break;
})
return {
batchItemFailures:[ { itemIdentifier: failedRequestId } ]
}
I have my lambda set up with a DestinationConfig.onFailure pointing to a DLQ I configured in SQS. The idea behind the handler is to process a batch of events and interrupt at the first failure. Then it reports the most recent failure in 'batchItemFailures' which tells the stream to continue at that record next try. (I pulled the idea from this article)
My current issue is that if there is a genuine failure of my handle() function on one of those records, then my exit code will trigger that record as my checkpoint for the next handler call. However the dlq condition doesn't ever trigger and I end up processing that record over and over again. I should also note that I am trying to avoid reprocessing records multiple times since handle() is not idempotent.
How can I elegantly handle errors while maintaining batching, but without triggering my handle() function more than once for well-behaved stream records?
I'm not sure if you have found the answer you were looking for. I'll respond in case someone else come across this issue.
There are 2 other parameters you'd want to use to avoid that issue. Quoting documentation (https://docs.aws.amazon.com/lambda/latest/dg/with-ddb.html):
Retry attempts – The maximum number of times that Lambda retries when the function returns an error. This doesn't apply to service errors or throttles where the batch didn't reach the function.
Maximum age of record – The maximum age of a record that Lambda sends to your function.
Basically, you'll have to specify how many time the failures should be retried and how far back in the events Lambda should be looking at.

How to query big data in DynamoDB in best practice

I have a scenario: query the list of student in school, by year, and then use that information to do some other tasks, let say printing a certificate for each student
I'm using the serverless framework to deal with that scenario with this Lambda:
const queryStudent = async (_school_id, _year) => {
var params = {
TableName: `schoolTable`,
KeyConditionExpression: 'partition_key = _school_id AND begins_with(sort_key, _year)',
};
try {
let _students = [];
let items;
do {
items = await dynamoClient.query(params).promise();
_students = items.Items;
params.ExclusiveStartKey = items.LastEvaluatedKey;
} while (typeof items.LastEvaluatedKey != 'undefined');
return _students;
} catch (e) {
console.log('Error: ', e);
}
};
const mainHandler = async (event, context) => {
…
let students = await queryStudent(body.school_id, body.year);
await printCerificate(students)
…
}
So far, it’s working well with about 5k students (just sample data)
My concern: is it a scalable solution to query large data in DynamoDB?
As I know, Lambda has limited time execution, if the number of student goes up to a million, does the above solution still work?
Any best practice approach for this scenario is very appreciated and welcome.
If you think about scaling, there are multiple potential bottlenecks here, which you could address:
Hot Partition: right now you store all students of a single school in a single item collection. That means that they will be stored on a single storage node under the hood. If you run many queries against this, you might run into throughput limitations. You can use things like read/write sharding here, e.g. add a suffix to the partition key and do scatter-gatter with the data.
Lambda: Query: If you want to query a million records, this is going to take time. Lambda might not be able to do that (and the processing) in 15 minutes and if it fails before it's completely through, you lose the information how far you've come. You could do checkpointing for this, i.e. save the LastEvaluatedKey somewhere else and check if it exists on new Lambda invocations and start from there.
Lambda: Processing: You seem to be creating a certificate for each student in a year in the same Lambda function you do the querying. This is a solution that won't scale if it's a synchronous process and you have a million students. If stuff fails, you also have to consider retries and build that logic in your code.
If you want this to scale to a million students per school, I'd probably change the architecture to something like this:
You have a Step Function that you invoke when you want to print the certificates. This step function has a single Lambda function. The Lambda function queries the table across sharded partition keys and writes each student into an SQS queue for certificate-printing tasks. If Lambda notices, it's close to the runtime limit, it returns the LastEvaluatedKey and the step function recognizes thas and starts the function again with this offset. The SQS queue can invoke Lambda functions to actually create the certificates, possibly in batches.
This way you decouple query from processing and also have built-in retry logic for failed tasks in the form of the SQS/Lambda integration. You also include the checkpointing for the query across many items.
Implementing this requires more effort, so I'd first figure out, if a million students per school per year is a realistic number :-)

Azure Queue Storage Triggered-Webjob - Dequeue Multiple Messages at a time

I have an Azure Queue Storage-Triggered Webjob. The process my webjob performs is to index the data into Azure Search. Best practice for Azure Search is to index multiple items together instead of one at a time, for performance reasons (indexing can take some time to complete).
For this reason, I would like for my webjob to dequeue multiple messages together so I can loop through, process them, and then index them all together into Azure Search.
However I can't figure out how to get my webjob to dequeue more than one at a time. How can this be accomplished?
For this reason, I would like for my webjob to dequeue multiple messages together so I can loop through, process them, and then index them all together into Azure Search.
According to your description, I suggest you could try to use Microsoft.Azure.WebJobs.Extensions.GroupQueueTrigger to achieve your requirement.
This extension will enable you to trigger functions and receive the group of messages instead of a single message like with [QueueTrigger].
More details, you could refer to below code sample and article.
Install:
Install-Package Microsoft.Azure.WebJobs.Extensions.GroupQueueTrigger
Program.cs:
static void Main()
{
var config = new JobHostConfiguration
{
StorageConnectionString = "...",
DashboardConnectionString = "...."
};
config.UseGroupQueueTriggers();
var host = new JobHost(config);
host.RunAndBlock();
}
Function.cs:
//Receive 10 messages at one time
public static void MyFunction([GroupQueueTrigger("queue3", 10)]List<string> messages)
{
foreach (var item in messages)
{
Console.WriteLine(item);
}
}
Result:
How would I get that changed to a GroupQueueTrigger? Is it an easy change?
In my opinion, it is an easy change.
You could follow below steps:
1.Install the package Microsoft.Azure.WebJobs.Extensions.GroupQueueTrigger from Nuget Package manager.
2.Change the program.cs file enable UseGroupQueueTriggers.
3.Change the webjobs functions according to your old triggered function.
Note:The group queue message trigger must use list.
As my code sample shows:
public static void MyFunction([GroupQueueTrigger("queue3", 10)]List<string> messages)
This function will get 10 messages from the "queue3" one time, so in this function you could change the function loop the list of the messages and process them, then index them all together into Azure Search.
4.Publish your webjobs to azure web apps.

Postman - how to loop request until I get a specific response?

I'm testing API with Postman and I have a problem:
My request goes to sort of middleware, so either I receive a full 1000+ line JSON, or I receive PENDING status and empty array of results:
{
"meta": {
"status": "PENDING",
"missing_connectors_count": 0,
"xxx_type": "INTERNATIONAL"
},
"results": []
}
The question is, how to loop this request in Postman until I will get status SUCCESS and results array > 0?
When I'm sending those requests manually one-by-one it's ok, but when I'm running them through Collection Runner, "PENDING" messes up everything.
I found an awesome post about retrying a failed request by Christian Baumann which allowed me to find a suitable approach to the exact same problem of first polling the status of some operation and only when it's complete run the actual tests.
The code I'd end up if I were you is:
const maxNumberOfTries = 3; // your max number of tries
const sleepBetweenTries = 5000; // your interval between attempts
if (!pm.environment.get("tries")) {
pm.environment.set("tries", 1);
}
const jsonData = pm.response.json();
if ((jsonData.meta.status !== "SUCCESS" && jsonData.results.length === 0) && (pm.environment.get("tries") < maxNumberOfTries)) {
const tries = parseInt(pm.environment.get("tries"), 10);
pm.environment.set("tries", tries + 1);
setTimeout(function() {}, sleepBetweenTries);
postman.setNextRequest(request.name);
} else {
pm.environment.unset("tries");
// your actual tests go here...
}
What I liked about this approach is that the call postman.setNextRequest(request.name) doesn't have any hardcoded request names. The downside I see with this approach is that if you run such request as a part of the collection, it will be repeated a number of times, which might bloat your logs with unnecessary noise.
The alternative I was considering is writhing a Pre-request Script which will do polling (by sending a request) and spinning until the status is some kind of completion. The downside of this approach is the need for much more code for the same logic.
When waiting for services to be ready, or when polling for long-running job results, I see 4 basic options:
Use Postman collection runner or newman and set a per-step delay. This delay is inserted between every step in the collection. Two challenges here: it can be fragile unless you set the delay to a value the request duration will never exceed, AND, frequently, only a small number of steps need that delay and you are increasing total test run time, creating excessive build times for a common build server delaying other pending builds.
Use https://postman-echo.com/delay/10 where the last URI element is number of seconds to wait. This is simple and concise and can be inserted as a single step after the long running request. The challenge is if the request duration varies widely, you may get false failures because you didn't wait long enough.
Retry the same step until success with postman.setNextRequest(request.name);. The challenge here is that Postman will execute the request as fast as it can which can DDoS your service, get you black-listed (and cause false failures), and chew up a lot of CPU if run on a common build server - slowing other builds.
Use setTimeout() in a Pre-request Script. The only downside I see in this approach is that if you have several steps needing this logic, you end up with some cut & paste code that you need to keep in sync
Note: there are minor variations on these - like setting them on a collection, a collection folder, a step, etc.
I like option 4 because it provides the right level of granularity for most of my cases. Note that this appears to be the only way to "sleep" in a Postman script. Now standard javascript sleep methods like a Promise with async and await are not supported and using the sandbox's lodash _.delay(function() {}, delay, args[...]) does not keep script execution on the Pre-request script.
In Postman standalone app v6.0.10, set your step Pre-request script to:
console.log('Waiting for job completion in step "' + request.name + '"');
// Construct our request URL from environment variables
var url = request['url'].replace('{{host}}', postman.getEnvironmentVariable('host'));
var retryDelay = 1000;
var retryLimit = 3;
function isProcessingComplete(retryCount) {
pm.sendRequest(url, function (err, response) {
if(err) {
// hmmm. Should I keep trying or fail this run? Just log it for now.
console.log(err);
} else {
// I could also check for response.json().results.length > 0, but that
// would omit SUCCESS with empty results which may be valid
if(response.json().meta.status !== 'SUCCESS') {
if (retryCount < retryLimit) {
console.log('Job is still PENDING. Retrying in ' + retryDelay + 'ms');
setTimeout(function() {
isProcessingComplete(++retryCount);
}, retryDelay);
} else {
console.log('Retry limit reached, giving up.');
postman.setNextRequest(null);
}
}
}
});
}
isProcessingComplete(1);
And you can do your standard tests in the same step.
Note: Standard caveats apply to making retryLimit large.
Try this:
var body = JSON.parse(responseBody);
if (body.meta.status !== "SUCCESS" && body.results.length === 0){
postman.setNextRequest("This_same_request_title");
} else {
postman.setNextRequest("Next_request_title");
/* you can also try postman.setNextRequest(null); */
}
I was searching for an answer to the same question and thought of a possible solution as I was reading your question.
Use postman workflow to rerun your request every time you don't get the response you're looking for. Anyway, that's what I'm gonna try.
postman.setNextRequest("request_name");
https://www.getpostman.com/docs/workflows
I didn't succeed to find the complete guidelines for this issue that's why I decided to invest some time and to describe all steps of the process from A to Z.
I will be observing an example where we will need to pass through transaction ids and in each iteration to change query param for next transaction id from the list.
Step 1. Prepare your request
https://some url/{{queryParam}}
Add {{queryParam}} variable for changing it from pre-request script.
If you need a token for request you should add it here, in Authorization tab.
Save request to collection (Save button in the right corner). For demonstration purpose I will use "Transactions Request" name. We will need to use this name later on.
Step 2. Prepare pre-request script
In postman use tab Pre-request Script to change transactionId variable from query param to actual transaction id.
let ids = pm.collectionVariables.get("TransactionIds");
ids = JSON.parse(ids);
const id = ids.shift();
console.log('id', id)
postman.setEnvironmentVariable("transactionId", id);
pm.collectionVariables.set("TransactionIds", JSON.stringify(ids));
pm.collectionVariables.get - gets array of transaction ids from collection variables. We will set it up in Step 4.
ids.shift() - we use it to remove id that we will use from our ids list (to prevent running twice on the same id)
postman.setEnvironmentVariable("transactionId", id) - change transaction id from query param to actual transaction id
pm.collectionVariables.set("TransactionIds", JSON.stringify(ids)) - we are setting up a new collection of variables that now does not include the id that was handled.
Step 3. Prepare Tests
In postman use tab Tests to create a loop logic. Tests will be executed after the request execution, so we can use it to make next request.
let ids = pm.collectionVariables.get("TransactionIds");
ids = JSON.parse(ids);
if (ids && ids.length > 0){
console.log('length', ids.length);
postman.setNextRequest("Transactions Request");
} else {
postman.setNextRequest(null);
}
postman.setNextRequest("Transactions Request") - calls a new request, in this case it will call the "Transactions Request" request
Step 4. Run Collections
In Postman from the left side bar you should choose Collections (click on it) and then choose a tab Variables.
This is the collection variables. In our example we used TransactionIds as a variable, so put in Current Value the array of transaction ids on which you want to loop.
Now you can click on Run (the button from right corner, near Save button) to run our loop requests.
You will be proposed to choose on which request you want to perform an action. Choose the request that we’ve created "Transactions Request".
It will run our request with pre-request script and with logic that we’ve set in Tests. In the end postman will open a new window with summary of our run.

How to lock a long async call in a WebApi action?

I have this scenario where I have a WebApi and an endpoint that when triggered does a lot of work (around 2-5min). It is a POST endpoint with side effects and I would like to limit the execution so that if 2 requests are sent to this endpoint (should not happen, but better safe than sorry), one of them will have to wait in order to avoid race conditions.
I first tried to use a simple static lock inside the controller like this:
lock (_lockObj)
{
var results = await _service.LongRunningWithSideEffects();
return Ok(results);
}
this is of course not possible because of the await inside the lock statement.
Another solution I considered was to use a SemaphoreSlim implementation like this:
await semaphore.WaitAsync();
try
{
var results = await _service.LongRunningWithSideEffects();
return Ok(results);
}
finally
{
semaphore.Release();
}
However, according to MSDN:
The SemaphoreSlim class represents a lightweight, fast semaphore that can be used for waiting within a single process when wait times are expected to be very short.
Since in this scenario the wait times may even reach 5 minutes, what should I use for concurrency control?
EDIT (in response to plog17):
I do understand that passing this task onto a service might be the optimal way, however, I do not necessarily want to queue something in the background that still runs after the request is done.
The request involves other requests and integrations that take some time, but I would still like the user to wait for this request to finish and get a response regardless.
This request is expected to be only fired once a day at a specific time by a cron job. However, there is also an option to fire it manually by a developer (mostly in case something goes wrong with the job) and I would like to ensure the API doesn't run into concurrency issues if the developer e.g. double-sends the request accidentally etc.
If only one request of that sort can be processed at a given time, why not implement a queue ?
With such design, no more need to lock nor wait while processing the long running request.
Flow could be:
Client POST /RessourcesToProcess, should receive 202-Accepted quickly
HttpController simply queue the task to proceed (and return the 202-accepted)
Other service (windows service?) dequeue next task to proceed
Proceed task
Update resource status
During this process, client should be easily able to get status of requests previously made:
If task not found: 404-NotFound. Ressource not found for id 123
If task processing: 200-OK. 123 is processing.
If task done: 200-OK. Process response.
Your controller could look like:
public class TaskController
{
//constructor and private members
[HttpPost, Route("")]
public void QueueTask(RequestBody body)
{
messageQueue.Add(body);
}
[HttpGet, Route("taskId")]
public void QueueTask(string taskId)
{
YourThing thing = tasksRepository.Get(taskId);
if (thing == null)
{
return NotFound("thing does not exist");
}
if (thing.IsProcessing)
{
return Ok("thing is processing");
}
if (!thing.IsProcessing)
{
return Ok("thing is not processing yet");
}
//here we assume thing had been processed
return Ok(thing.ResponseContent);
}
}
This design suggests that you do not handle long running process inside your WebApi. Indeed, it may not be the best design choice. If you still want to do so, you may want to read:
Long running task in WebAPI
https://blogs.msdn.microsoft.com/webdev/2014/06/04/queuebackgroundworkitem-to-reliably-schedule-and-run-background-processes-in-asp-net/