Reusable State Definition in Step Functions - amazon-web-services

We are creating a workflow composed of multiple SQL Operations(Aggregations, Transposes etc.) via AWS Step functions. Every operation is modelled as a separate Lambda which houses the SQL query.
Now, every query accepts its input parameters from the state machine, so every lambda task is as below:
"SQLQueryTask": {
"Type": "Task",
"Parameters": {
"param1.$": "$$.Execution.Input.param1",
"param2.$": "$$.Execution.Input.param2"
},
"Resource": "LambdaArn",
"End": true
}
The Parameters block thus repeats for every SQLQuery node.
Added to this since Lambdas can fail intermittently and we would like to retry for them ; we also need to have below retry block in every State:
"Retry": [ {
"ErrorEquals": [ "Lambda.ServiceException", "Lambda.AWSLambdaException", "Lambda.SdkClientException"],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
} ]
This is making the state definition very complex. Is there No way to extract out the common part of state definition to a reusable piece?

One solution could be using AWS CDK (https://aws.amazon.com/cdk/)
This allows developers to define higher-level abstractions of resources, which can easily be reused.
There are some example here that could be helpful: https://docs.aws.amazon.com/cdk/api/latest/docs/aws-stepfunctions-readme.html

Related

Lambda - Is there a way to conditionally switch between Unreserved and Reserved concurrency in serverless templates?

I am using serverless template to create a lambda function in AWS.
If I don't specify any value for the property "ReservedConcurrentExecutions", then the function gets created with Unreserved concurrency.
Now, I would like to use reserved concurrency (or unreserved) depending on an input parameter.
Function with Reserved Concurrency:
"MyFunction": {
"Type": "AWS::Serverless::Function",
"Properties": {
"Handler": "MyFunctionHandler",
"CodeUri": "myfunction.zip",
"ReservedConcurrentExecutions" : 2,
}
}
Function with Unreserved Concurrency: (just don't use the ReservedConcurrentExecutions property)
"MyFunction": {
"Type": "AWS::Serverless::Function",
"Properties": {
"Handler": "MyFunctionHandler",
"CodeUri": "myfunction.zip",
}
}
I know I can declare the 2 functions separately and have a Condition to create one or the other.
What I would like to know is if it is possible to have just one function and conditionally add the ReservedConcurrentExecutions property.
Thank you!
Serverless framework does not support conditional statements and properties to resources, but you can try and use this "ifelse" plugin.

AWS Step Functions IF equivalent

Im trying to do the following at AWS Step Functions:
IF ExampleState fails, do "Next":"Anotherlambda"
IF ExampleState completes successfull, end the execution.
How can i do that? The CHOICE state doesn't support ErrorEquals: States.TaskFailed.
In my case, when ExampleState fails, State Machine STOPS and gives me error, but i want to continue and catch some info from the error and save it with another lambda
Thanks!
All i wanted AWS Step Functions to do is, if a State succeeded, finish the execution, if fails, run another lambda. Like an IF / ELSE on programming.
Step Functions gives this easy to you as a CATCH block that only activates if catches an error and does what you want. Here the solution:
"StartAt": "ExampleLambda",
"States": {
"ExampleLambda": {
"Type": "Task",
"Resource": "xxx:function:ExampleLambda",
"Catch": [
{
"ErrorEquals":["States.TaskFailed"],
"Next": "SendToErrorQueue"
}
],
"End": true
}

AWS Step Function: How can i invoke multiple instances of same lambda in step function

I have an SQS Queue of which I monitor its size from a state machine.
If size > desired size then I trigger some lambda functions, otherwise, it waits for 30 seconds and checks the queue size again.
Here is my probem: when the queue length is > 20000 I want to trigger 10 lambda functions to empty it faster. And if its length is <2000 then I want to only run 1 lambda function.
For now, I have hard coded ten parallel steps but its waste of resources if the queue size is less than 2000.
"CheckSize": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.Payload.size",
"NumericGreaterThan": 2000,
"Next": "invoke_lambda"
},
{
"Variable": "$.Payload.size",
"NumericLessThan": 2000,
"Next": "Wait30s"
}
],
"Default": "Wait30s"
},
AWS Step Functions does not appear to be the best tool in your scenario. I think you should be using one of the SQS metrics available for CloudWatch. It should be ApproximateNumberOfMessagesVisible in your case. You can create an alarm if ApproximateNumberOfMessagesVisible >= 20,000. Action for that alarm would probably be SNS topic to which you can subscribe a Lambda function. In the Lambda function you can asynchronously invoke your Lambda function 10 times that is supposed to clear down the queue.
Check out AWS docs for creating a CloudWatch alarm for SQS metric
Using Step Functions:
If you want to do it with Step Functions then I don't think you need any Condition check in your state machine definition. All you need is to pass the $.size to a Lambda function and put the condition in that Lambda function. If size >= 20000 then asynchronously invoke queue processing function 10 times else 1.
AWS Step Functions now supports dynamic parallelism, so you can optimize the performance and efficiency of application workflows such as data processing and task automation. By running identical tasks in parallel, you can achieve consistent execution durations and improve utilization of resources to save on operating costs. Step Functions automatically scales resources in response to your input.
https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-map-state.html
https://aws.amazon.com/about-aws/whats-new/2019/09/aws-step-functions-adds-support-for-dynamic-parallelism-in-workflows/
Not diving deep into the solution that you have come up with and focusing on providing guidance on your question
So, If you see, you have answer the question yourself. The simplest solution is to make one more step called invoke10Lambdas and use it from your choice. Pseudo code for your step function would look something like this.
....
....
"CheckSizeAndDivert": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.Payload.size",
"NumericGreaterThan": 20000,
"Next": "invoke_10_lambdas"
},
{
"Variable": "$.Payload.size",
"NumericGreaterThan": 2000,
"Next": "invoke_lambda"
}
],
"Default": "Wait30s"
},
"invoke_10_lambdas": {
// This is your parallel step.
...
Next:"whatever next(i believe it is Wait30)"
},
"invoke_lambda": {
...
// This is your single lambda step.
...
Next:"whatever next(i believe it is Wait30)"
},
...
...
SQS now supports using Lambda as a EventSourceMapping so the recommendation would be to have AWS directly take control of this and scale lambdas as necessary.
Example CloudFormation template would be
"EventSourceMapping": {
"Type": "AWS::Lambda::EventSourceMapping",
"Properties": {
"BatchSize": 10,
"Enabled": true,
"EventSourceArn" : { "Fn::GetAtt" : ["SQSStandupWork", "Arn"] },
"FunctionName" : {
"Fn::Join": [
":", [
{ "Fn::GetAtt" : ["LambdaFunction", "Arn"] },
"production"
]
]
}
}
}
If you are really set on using a step function to drive this forward, you can create another choice on top of what you currently have
A - execute in parallel 1 lambda (A1 => stop) + a checker (B)
B - call lambda and check the size return Wait30 (B1 if size is less than 2000), return Parallel if size is > 20000 (B2)
B1 - wait 30 and then NEXT: A
B2 - Have a parallel with 9 lambdas (since the 10 is A) => NEXT: A
Additional alternatives are:
CloudWatch event to schedule triggering every 30seconds
trigger the 10 parallel lambda functions directly from a separate lambda. A lambda could check the size and then directly call the other lambdas in ASYNC. since it doesn't matter what the result is, since we'll check again in 30 seconds the step function will retry.
The biggest problem with your suggested approach is that the step function has a 1 year limit, so unless you are sure the queue will be drained within a year you'll have a problem when you get to the end. Even if you set it up to retrigger a new step function, you'll be paying a lot of unnecessary step transitions (step functions are not the cheapest.)

How to create stages for AWS state machine?

I have created a simple AWS state machine with lambda functions. Like below
{
"Comment":"Validates data",
"StartAt": "ChooseDocumentType",
"States": {
"ChooseDocumentType": {
"Type": "Choice",
"Choices":[
{
"Variable":"$.documentType",
"StringEquals":"RETURN",
"Next":"ValidateReturn"
},
{
"Variable":"$.documentType",
"StringEquals":"ASSESSMENT",
"Next":"ValidateAssessment"
}
],
"Default":"DefaultState"
},
"ValidateReturn":{
"Type":"Task",
"Resource":"arn:aws:lambda:us-west-2:111111111:function:ValidateReturn",
"Next":"DefaultState"
},
"ValidateAssessment":{
"Type":"Task",
"Resource":"arn:aws:lambda:us-west-2:111111111:function:ValidateAssessment",
"Next":"DefaultState"
},
"DefaultState":{
"Type":"Pass",
"End":true
}
}
}
Questions
1> How do i create stages for this state machine. (like production, development etc)?
2>Each lambda function has alias pointing to different versions. So development alias always point to $latest version and production alias point to, lets say, version 2. How do i dynamically associate state machine's stages with these lambda alias? So state machine in development stage should use lambda function with alias development and so on.
I am using AWS console to manage state machines and lambdas, and i don't see any action to create stages for state machine
You can declare the alias and the version in the Lambda ARN:
# default, $LATEST
arn:aws:lambda:us-west-2:111111111:function:ValidateAssessment
# using alias
arn:aws:lambda:us-west-2:111111111:function:ValidateAssessment:development
# using version
arn:aws:lambda:us-west-2:111111111:function:ValidateAssessment:2
Use these in the Step Function definition according to your needs.
Re: # 2, if your main concern is controlling which Lambda alias gets invoked, there is a way you can do that via a single step function.
Your step function state definition would be something like:
{
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"InvocationType": "RequestResponse",
"FunctionName": "someFunction",
"Qualifier.$": "$.lambdaAlias",
"Payload": {}
},
}
So where you execute the step function and would specify the stage if there was such a thing, you'd pass a lambdaAlias parameter. (There's nothing magical about that name, you can pull it from whatever step function input parameter you want.)
The request payload to your Lambda would go in Parameters.Payload.
https://docs.aws.amazon.com/step-functions/latest/dg/connect-lambda.html

AWS Step Functions history event limitation

I use step functions for a big loop, so far no problem, but the day when my loop exceeded 8000 executions I came across the error "Maximum execution history size" which is 25000.
There is there a solution for not having the history events?
Otherwise, where I can easily migrate my step functions (3 lambda) because aws batch will ask me a lot of code rewrite ..
Thanks a lot
One approach to avoid the 25k history event limit is to add a choice state in your loop that takes in a counter or boolean and decides to exit the loop.
Outside of the loop you can put a lambda function that starts another execution (with a different id). After this, your current execution completes normally and another execution will continue to do the work.
Please note that the "LoopProcessor" in the example below must return a variable "$.breakOutOfLoop" to break out of the loop, which must also be determined somewhere in your loop and passed through.
Depending on your use case, you may need to restructure the data you pass around. For example, if you are processing a lot of data, you may want to consider using S3 objects and pass the ARN as input/output through the state machine execution. If you are trying to do a simple loop, one easy way would be to add a start offset (think of it as a global counter) that is passed into the execution as input, and each LoopProcessor Task will increment a counter (with the start offset as the initial value). This is similar to pagination solutions.
Here is a basic example of the ASL structure to avoid the 25k history event limit:
{
"Comment": "An example looping while avoiding the 25k event history limit.",
"StartAt": "FirstState",
"States": {
"FirstState": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
"Next": "ChoiceState"
},
"ChoiceState": {
"Type" : "Choice",
"Choices": [
{
"Variable": "$.breakOutOfLoop",
"BooleanEquals": true,
"Next": "StartNewExecution"
}
],
"Default": "LoopProcessor"
},
"LoopProcessor": {
"Type" : "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:ProcessWork",
"Next": "ChoiceState"
},
"StartNewExecution": {
"Type" : "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:StartNewLooperExecution",
"Next": "FinalState"
},
"FinalState": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
"End": true
}
}
}
Hope this helps!
To guarantee the execution of all the steps and their orders, step function stores the history of execution after the completion of each state, this storing is the reason behind the limit on the history execution size.
Having said that, one way to mitigate this limit is by following #sunnyD answer. However, it has below limitations
the invoker of a step function(if there is one) will not get the execution output of the complete data. Instead, he gets the output of the first execution in a chain of execution.
The limit on the number of execution history size has a high chance of increasing in the future versions so writing logic on this number would require you to modify the code/configuration every time the limit is increased or decreased.
Another alternate solution is to arrange step function as parent and child step functions. In this arrangement, the parent step function contains a task to loop through the entire set of data and create new execution of child step function for each record or set of records(a number which is will not exceed history execution limit of a child SF) in your data. The second step in parent step function will wait for a period of time before it checks the Cloudwatch metrics for the completion of all child function and exits with the output.
Few things to keep in mind about this solution are,
The startExecution API will throttle at 500 bucket size with 25 refills every second.
Make sure your wait time in parent SF is sufficient for child SFs to finish its execution otherwise implement a loop to check the completion of child SF.