I'm constructing my first state machine using AWS Step Functions and inside the state machine I'm invoking Go Lambdas. I'm starting the execution of the state machine from another Lambda that passes some input I'd like to reference inside different parts of the state machine. I notice I'm losing that input between the LambdaFunctionScheduled stage and the LambdaFunctionStarted stage and as a result I don't seem to have an event in my Lambda where I can grab the pieces of info I need. Am I missing a step?
Here is the state machine I'm creating in Terraform:
resource "aws_sfn_state_machine" "bulk_state_machine" {
name = "bulk_state_machine"
role_arn = "${aws_iam_role.bulk_state_machine_role.arn}"
definition = <<EOF
{
"Comment": "A state machine to orchestrate a series of Lambdas that complete the bulk provisioning process",
"StartAt": "CreateBuckets",
"States": {
"CreateBuckets": {
"Type": "Task",
"Resource": "${aws_lambda_function.createBulkProvisionBuckets.arn}",
"End": true
}
}
}
EOF
}
And this is the struct of input I'm marshalling into JSON and sending along as input in the Lambda that begins execution of the Step Function:
sfnInput := models.BulkSFNInput{
DefaultRegion: brand.DefaultRegion,
OtherRegions: brand.OtherRegions,
ACMARN: brand.ACMARN,
}
Related
I have a StepFunction with input:
{
"Jobs": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0]
}
and I want the sum of the values in this array. According to the Json-Path docs there should exist a .sum() function for this. When I try it here it even works. So I defined the following Pass state:
"Sum Jobs": {
"Type": "Pass",
"Parameters": {
"Jobs.$": "$.Jobs.sum()"
}
},
Nevertheless executions fail with:
"An error occurred while executing the state 'Sum Jobs' (entered at the event id #249). The JSONPath '$.Jobs.sum()' specified for the field 'Jobs.$' could not be found in the input '{\"Jobs\":[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0]}'"
You will need a Lambda Task for this. Step Functions' intrinsic functions (= operations accessible outside of tasks), do not include any math or array manipulation operations.
Im trying to do the following at AWS Step Functions:
IF ExampleState fails, do "Next":"Anotherlambda"
IF ExampleState completes successfull, end the execution.
How can i do that? The CHOICE state doesn't support ErrorEquals: States.TaskFailed.
In my case, when ExampleState fails, State Machine STOPS and gives me error, but i want to continue and catch some info from the error and save it with another lambda
Thanks!
All i wanted AWS Step Functions to do is, if a State succeeded, finish the execution, if fails, run another lambda. Like an IF / ELSE on programming.
Step Functions gives this easy to you as a CATCH block that only activates if catches an error and does what you want. Here the solution:
"StartAt": "ExampleLambda",
"States": {
"ExampleLambda": {
"Type": "Task",
"Resource": "xxx:function:ExampleLambda",
"Catch": [
{
"ErrorEquals":["States.TaskFailed"],
"Next": "SendToErrorQueue"
}
],
"End": true
}
I am attempting to create an AWS StepFunctions workflow where I have a Lambda task followed by an ECS/Fargate task.
The Lambda takes an ID as an input and outputs some data in JSON form that is used by the ECS task, which is runs a Python script in its container environment. What I would like to do in StepFunctions is the following flow:
{ id: 1234 } -> [Lambda] -> { id: 1234, data: {...} }
{ id: 1234, data: {...} } -> [ECS] -> { id: 1234, result: "bar"}
For reference, here is an example configuration of an ECS Task:
https://docs.aws.amazon.com/step-functions/latest/dg/sample-project-container-task-notification.html
I cannot figure out any way to pass the structured JSON input of a ECS Task to the container running the task.
Here are the things I have found so far:
I can pass individual fields of a JSON input to the container by using JSONPath to select individual fields of the input and set them to environment variables. But if I assign the entire input object ($) to an environment variable, then it fails at runtime with a serialization error ([Object] cannot be converted to a string).
I can create an intermediate lambda that takes the input and converts it to a JSON string that is stored in a single key-value in the output, then assign this single string key-value to an environment variable of ECS Task and parse it. However, this requires adding an entire extra Task and a few seconds of runtime + cost.
Here are some things I can't do:
There doesn't seem to be any mechanism in boto3 to get the input of an existing ECS Task. I can get the input of an unassigned Activity, or I can get the input of the entire Execution. But there is no API for just getting the input of an existing, running Task, even though I have a Task Token.
I cannot modify my original Lambda to output JSON as a string. I am using this result in multiple places (parallel tasks), and the other tasks are Lambdas that consume specific subfields of the output as their input.
What is the intended mechanism to pass a structured JSON object defined as the input to a Task to the executing container of an ECS/Fargate Task?
You can use intrinsic functions to format the request before running the task:
const formatRequest = new sfn.Pass(this, 'FormatRequest', {
parameters: {
'request.$': 'States.JsonToString($)'
}
})
Given you don't specify in the step that runs the Lambda a result path, the input of your container will be the output of your Lambda, that translates to:
"Overrides": {
"ContainerOverrides": [
{
"Name": "container-name",
"Environment": [
{
"Name": "SOME_ENV_VAR",
"Value.$": "$"
},
But even this is limited to what you can store as ENV, so you would need to make sure your JSON is actually a string
What is the intended mechanism to pass a structured JSON object
defined as the input to a Task to the executing container of an
ECS/Fargate Task?
Take a look at the Input and OutPut processing docs: https://docs.aws.amazon.com/step-functions/latest/dg/concepts-input-output-filtering.html
This will help with you deciding what of the JSON input you want passed to the "Run Fargate Task" state (from the example you linked in you question)
Step Functions support the "RunTask" of ECS and a couple parameters: https://docs.aws.amazon.com/step-functions/latest/dg/connect-ecs.html
For Example,
Suppose I have my Lambda Function output this JSON
{
"commands": [
"foo": { "bar" },
"some command 1",
"some command 2"
]
}
I want my Run Fargate Task to haven an Input Path that only gets all of the input. In my state machine, after "Type": "Task", I will put:
"InputPath":"$.commands",
Then in my "Parameters" for my Fargate Task after "NetworkConfigurations:{....}," I will place the Container Overrides that I want using the JSON Path syntax: https://github.com/json-path/JsonPath. However, I don't want all the input from the JSON, just the value of "foo"
"Overrides": {
"ContainerOverrides": [
{
"Name": "container-name",
"Command.$": "$.commands.foo"
}
]
}
You can use the syntax used here: https://docs.aws.amazon.com/step-functions/latest/dg/connect-ecs.html
I have an SQS Queue of which I monitor its size from a state machine.
If size > desired size then I trigger some lambda functions, otherwise, it waits for 30 seconds and checks the queue size again.
Here is my probem: when the queue length is > 20000 I want to trigger 10 lambda functions to empty it faster. And if its length is <2000 then I want to only run 1 lambda function.
For now, I have hard coded ten parallel steps but its waste of resources if the queue size is less than 2000.
"CheckSize": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.Payload.size",
"NumericGreaterThan": 2000,
"Next": "invoke_lambda"
},
{
"Variable": "$.Payload.size",
"NumericLessThan": 2000,
"Next": "Wait30s"
}
],
"Default": "Wait30s"
},
AWS Step Functions does not appear to be the best tool in your scenario. I think you should be using one of the SQS metrics available for CloudWatch. It should be ApproximateNumberOfMessagesVisible in your case. You can create an alarm if ApproximateNumberOfMessagesVisible >= 20,000. Action for that alarm would probably be SNS topic to which you can subscribe a Lambda function. In the Lambda function you can asynchronously invoke your Lambda function 10 times that is supposed to clear down the queue.
Check out AWS docs for creating a CloudWatch alarm for SQS metric
Using Step Functions:
If you want to do it with Step Functions then I don't think you need any Condition check in your state machine definition. All you need is to pass the $.size to a Lambda function and put the condition in that Lambda function. If size >= 20000 then asynchronously invoke queue processing function 10 times else 1.
AWS Step Functions now supports dynamic parallelism, so you can optimize the performance and efficiency of application workflows such as data processing and task automation. By running identical tasks in parallel, you can achieve consistent execution durations and improve utilization of resources to save on operating costs. Step Functions automatically scales resources in response to your input.
https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-map-state.html
https://aws.amazon.com/about-aws/whats-new/2019/09/aws-step-functions-adds-support-for-dynamic-parallelism-in-workflows/
Not diving deep into the solution that you have come up with and focusing on providing guidance on your question
So, If you see, you have answer the question yourself. The simplest solution is to make one more step called invoke10Lambdas and use it from your choice. Pseudo code for your step function would look something like this.
....
....
"CheckSizeAndDivert": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.Payload.size",
"NumericGreaterThan": 20000,
"Next": "invoke_10_lambdas"
},
{
"Variable": "$.Payload.size",
"NumericGreaterThan": 2000,
"Next": "invoke_lambda"
}
],
"Default": "Wait30s"
},
"invoke_10_lambdas": {
// This is your parallel step.
...
Next:"whatever next(i believe it is Wait30)"
},
"invoke_lambda": {
...
// This is your single lambda step.
...
Next:"whatever next(i believe it is Wait30)"
},
...
...
SQS now supports using Lambda as a EventSourceMapping so the recommendation would be to have AWS directly take control of this and scale lambdas as necessary.
Example CloudFormation template would be
"EventSourceMapping": {
"Type": "AWS::Lambda::EventSourceMapping",
"Properties": {
"BatchSize": 10,
"Enabled": true,
"EventSourceArn" : { "Fn::GetAtt" : ["SQSStandupWork", "Arn"] },
"FunctionName" : {
"Fn::Join": [
":", [
{ "Fn::GetAtt" : ["LambdaFunction", "Arn"] },
"production"
]
]
}
}
}
If you are really set on using a step function to drive this forward, you can create another choice on top of what you currently have
A - execute in parallel 1 lambda (A1 => stop) + a checker (B)
B - call lambda and check the size return Wait30 (B1 if size is less than 2000), return Parallel if size is > 20000 (B2)
B1 - wait 30 and then NEXT: A
B2 - Have a parallel with 9 lambdas (since the 10 is A) => NEXT: A
Additional alternatives are:
CloudWatch event to schedule triggering every 30seconds
trigger the 10 parallel lambda functions directly from a separate lambda. A lambda could check the size and then directly call the other lambdas in ASYNC. since it doesn't matter what the result is, since we'll check again in 30 seconds the step function will retry.
The biggest problem with your suggested approach is that the step function has a 1 year limit, so unless you are sure the queue will be drained within a year you'll have a problem when you get to the end. Even if you set it up to retrigger a new step function, you'll be paying a lot of unnecessary step transitions (step functions are not the cheapest.)
I have created a simple AWS state machine with lambda functions. Like below
{
"Comment":"Validates data",
"StartAt": "ChooseDocumentType",
"States": {
"ChooseDocumentType": {
"Type": "Choice",
"Choices":[
{
"Variable":"$.documentType",
"StringEquals":"RETURN",
"Next":"ValidateReturn"
},
{
"Variable":"$.documentType",
"StringEquals":"ASSESSMENT",
"Next":"ValidateAssessment"
}
],
"Default":"DefaultState"
},
"ValidateReturn":{
"Type":"Task",
"Resource":"arn:aws:lambda:us-west-2:111111111:function:ValidateReturn",
"Next":"DefaultState"
},
"ValidateAssessment":{
"Type":"Task",
"Resource":"arn:aws:lambda:us-west-2:111111111:function:ValidateAssessment",
"Next":"DefaultState"
},
"DefaultState":{
"Type":"Pass",
"End":true
}
}
}
Questions
1> How do i create stages for this state machine. (like production, development etc)?
2>Each lambda function has alias pointing to different versions. So development alias always point to $latest version and production alias point to, lets say, version 2. How do i dynamically associate state machine's stages with these lambda alias? So state machine in development stage should use lambda function with alias development and so on.
I am using AWS console to manage state machines and lambdas, and i don't see any action to create stages for state machine
You can declare the alias and the version in the Lambda ARN:
# default, $LATEST
arn:aws:lambda:us-west-2:111111111:function:ValidateAssessment
# using alias
arn:aws:lambda:us-west-2:111111111:function:ValidateAssessment:development
# using version
arn:aws:lambda:us-west-2:111111111:function:ValidateAssessment:2
Use these in the Step Function definition according to your needs.
Re: # 2, if your main concern is controlling which Lambda alias gets invoked, there is a way you can do that via a single step function.
Your step function state definition would be something like:
{
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"InvocationType": "RequestResponse",
"FunctionName": "someFunction",
"Qualifier.$": "$.lambdaAlias",
"Payload": {}
},
}
So where you execute the step function and would specify the stage if there was such a thing, you'd pass a lambdaAlias parameter. (There's nothing magical about that name, you can pull it from whatever step function input parameter you want.)
The request payload to your Lambda would go in Parameters.Payload.
https://docs.aws.amazon.com/step-functions/latest/dg/connect-lambda.html