I am using the AWS CDK to create a state machine that sends a message to a fifo queue and waits for a callback from the lambda worker to continue execution.
I would like the messages that get sent to the fifo queue to have a dynamic MessageGroupId assigned to them so I can control the number of lambda workers processing the messages. The only way I can think of to have a dynamic MessageGroupId is to reference some parameter on the step function input with JsonPath, however I have not come across any documentation about it. My initial tests to use JsonPath to dynamically pass the MessageGroupId failed, simply passing the string "$.MessageGroupId" effectively giving each message the same message group id and thus one lambda worker.
Is it possible to dynamically assign a message group id to a sqs message when sent from a step function?
If so, how?
With the help AWS Support, I managed to do it by either using the Context Object or passing an ID from the initial input and reference it with $.
Here's an example:
{
"Comment": "Generate unique MessageGroupId",
"StartAt": "Start",
"States": {
"Start": {
"Type": "Task",
"TimeoutSeconds": 60,
"Resource": "arn:aws:states:::sqs:sendMessage.waitForTaskToken",
"Parameters": {
"QueueUrl": "<YOUR_QUEUE_URL>",
"MessageBody": {
"Input.$": "$",
"TaskToken.$": "$$.Task.Token"
},
"MessageGroupId.$": "$$.Execution.Id"
},
"ResultPath": "$",
"End": true
}
}
}
My problem was that I was trying to MessageGroupId like so:
"MessageGroupId": "$$.Execution.Id"
Where I should have done:
"MessageGroupId.$": "$$.Execution.Id"
Appending .$ would resolve the expression "$$.Execution.Id" instead of putting literally the string "$$.Execution.Id".
Related
Let's say I have this state machine in AWS Step Function:
And I had started it with this input:
{
"item1": 1,
"item2": 2,
"item3": 3
}
It's clear for me that Action A is receiving the input payload. But, how can Action C access the state machine input to get the value of item3? Is it possible?
Thanks!!
Typically, the data available in Action C will be dependent on what the result/output of Action B is.
However, if you just care about the original input to the state machine execution, you can set the payload of Action C using the Context Object.
// roughly
"Action C": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"Payload.$": "$$.Execution.Input",
"FunctionName": "<action c lambda>"
},
Check out the AWS documentation for Context Object
I am running a step function with many different step yet I am still stuck on the 2nd step.
The first step is a Java Lambda that gets all the input parameters and does what it needs to do.
The lambda returns null as it doesn't need to return anything.
The next step is a call for API gateway which needs to use one of the parameters in the URL.
However, I see that neither the URL has the needed parameter nor do I actually get the parameters into the step. ("input": null under TaskStateEntered)
The API gateway step looks as follows: (I also tried "Payload.$": "$" instead of the "Input.$": "$")
"API Gateway start": {
"Type": "Task",
"Resource": "arn:aws:states:::apigateway:invoke",
"Parameters": {
"Input.$": "$",
"ApiEndpoint": "aaaaaa.execute-api.aa-aaaa-1.amazonaws.com",
"Method": "GET",
"Headers": {
"Header1": [
"HeaderValue1"
]
},
"Stage": "start",
"Path": "/aaa/aaaa/aaaaa/aaaa/$.scenario",
"QueryParameters": {
"QueryParameter1": [
"QueryParameterValue1"
]
},
"AuthType": "IAM_ROLE"
},
"Next": "aaaaaa"
},
But when my step function gets to this stage it fails and I see the following in the logs:
{
"name": "API Gateway start",
"input": null,
"inputDetails": {
"truncated": false
}
}
And eventually:
{
"error": "States.Runtime",
"cause": "An error occurred while executing the state 'API Gateway start' (entered at the event id #9). Unable to apply Path transformation to null or empty input."
}
What am I missing here? Note that part of the path is a value that I enter at the step function execution. ("Path": "/aaa/aaaa/aaaaa/aaaa/$.scenario")
EDIT:
As requested by #lynkfox, I am adding the lambda definition that comes before the API gateway step:
And to answer the question, yes its standard and I see no input.
"Run tasks": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"OutputPath": "$.Payload",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:aaaaaa-1:12345678910:function:aaaaaaa-aaa:$LATEST"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Next": "API Gateway start"
},
So yes, as I commented, I believe the problem is the OutputPath of your lambda task definition. What this is saying is Take whatever comes out of this lambda (which is nothing!) and cut off everything other than the key Payload.
Well you are returning nothing, so this causes nothing to be sent to the next task.
I am assuming your incoming vent already has a key in the Json that is named Payload, so what you want to do is remove the OutputPath from your lambda. It doesn't need to return anything so it doesn't need an Output or Result path.
Next, on your API task, assuming again that your initializing event has a key of Payload, you would have "InputPath": "$.Payload" - if you have your headers or parameters in the initializing json Event then, you can reference those keys in the Parameters section of the definition.
Every AWS Service begins with an Event and ends with an Event. Each Event is a JSON object. (Which I'm sure you know). With State Machines, this continues - the State Machine/Step Function is just the controller for passing Events from one Task to the next.
So any given task can have an InputPath, OutputPath, or Result Path - These three definition parameters can decide what values go into the Task and what are sent onto the Next Task. State machines are, by definition, for maintaining State between Tasks, and these help control that 'State' (and there is pretty much only one 'state' at any given time, the event heading to the next Task(s)
The ResultPath is where, in that overall Event, the task puts the data. If you put ResultPath: "$.MyResult" by itself it appends this key to the incoming event
If you add OutputPath, it ONLY passes that key from the output event of the Task onto the next step in the Step Functions.
These three give you a lot of control.
Want to Take an Event into a Lambda and respond with something completely different - you don't need the incoming data - you combine OutputPath and ResultPath with the same value (and your Lambda needs to respond with a Json Object) then you can replace the event wholesale.
If you have ResultPath of some value and OutputPath: "$." you create a new json object with a single Key that contains the result of your task (the key being the definition set in ResultPath
InputPath allows you to set what goes into the Task. I am not 100% certain but I'm pretty sure it does not remove anything from the next Task in the chain.
More information can be found here but it can get pretty confusing.
My quick guide:
ResultPath by itself if you want to append the data to the event
ResultPath + OutputPath of the same value if you want to cut off the Input and only have the output of the task continue (and it returns a JSON style object)
I have a step function that publishes to an SNS topic, which then sends an email notification. The email notification is sent as expected, but then the task gets stuck in "running" state when it should exit and terminate the step function. Does anyone know where I'm going wrong or what might be causing this?
"ErrorNotification": {
"Type": "Task",
"Resource":"arn:aws:states:::sns:publish.waitForTaskToken",
"OutputPath": "$",
"Parameters": {
"TopicArn": "<topic-arn>",
"Message":{
"Input.$":"$",
"TaskToken.$":"$$.Task.Token"
}
},
"End": true
},
this specific line
"Resource":"arn:aws:states:::sns:publish.waitForTaskToken",
implements a Wait for a Callback with the Task Token
Call Amazon SNS with Step Functions
The following includes a Task state that publishes to an Amazon SNS topic and then waits for the task token to be returned. See Wait for a Callback with the Task Token.
{
"StartAt":"Send message to SNS",
"States":{
"Send message to SNS":{
"Type":"Task",
"Resource":"arn:aws:states:::sns:publish.waitForTaskToken",
"Parameters":{
"TopicArn":"arn:aws:sns:us-east-1:123456789012:myTopic",
"Message":{
"Input.$":"$",
"TaskToken.$":"$$.Task.Token"
}
},
"End":true
}
}
}
In that case, you need to check if you are sending the appropriate event from the (usually a lambda) who is handling the callback and sending the final response back.
For example I handle my callback functionality via a lambda roughly like below for successful or failed.
...
LOG.info(f"Sending task heartbeat for task ID {body['taskToken']}")
STEP_FUNCTIONS_CLIENT.send_task_heartbeat(taskToken=body["taskToken"])
is_task_success = random.choice([True, False])
if is_task_success:
LOG.info(f"Sending task success for task ID {body['taskToken']}")
STEP_FUNCTIONS_CLIENT.send_task_success(
taskToken=body["taskToken"],
output=json.dumps({"id": body['id']})
)
else:
LOG.info(f"Sending task failure for task ID {body['taskToken']}")
STEP_FUNCTIONS_CLIENT.send_task_failure(
taskToken=body["taskToken"],
cause="Random choice returned False."
)
..
I am trying to use AWS Step Functions to trigger operations many S3 files via Lambda. To do this I am invoking a step function with an input that has a base S3 key of the file and part numbers each file (each parallel iteration would operate on a different S3 file). The input looks something like
{
"job-spec": {
"base_file_name": "some_s3_key-",
"part_array": [
"part-0000.tsv",
"part-0001.tsv",
"part-0002.tsv", ...
]
}
}
My Step function is very simple, takes that input and maps it out, however I can't seem to get both the file and the array as input to my lambda. Here is my step function definition
{
"Comment": "An example of the Amazon States Language using a map state to process elements of an array with a max concurrency of 2.",
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"ItemsPath": "$.job-spec",
"ResultPath": "$.part_array",
"MaxConcurrency": 2,
"Next": "Final State",
"Iterator": {
"StartAt": "My Stage",
"States": {
"My Stage": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-east-1:<>:function:some-lambda:$LATEST",
"Payload": {
"Input.$": "$.part_array"
}
},
"End": true
}
}
}
},
"Final State": {
"Type": "Pass",
"End": true
}
}
}
As written above it complains that that job-spec is not an array for the ItemsPath. If I change that to $.job-spec.array I get the array I'm looking for in my lambda but the base key is missing.
Essentially I want each python lambda to get the base file key and one entry from the array to stitch together the complete file name. I can't just put the complete file names in the array due to the limit limit of how much data I can pass around in Step Functions and that also seems like a waste of data
It looks like the Parameters value can be used for this but I can't quite get the syntax right
Was able to finally get the syntax right.
"ItemsPath": "$.job-spec.part_array",
"Parameters": {
"part_name.$": "$$.Map.Item.Value",
"base_file_name.$": "$.job-spec.base_file_name"
},
It seems that Parameters can be used to create custom inputs for each stage. The $$ is accessing the context of the stage and not the actual input. It appears that ItemsPath takes the array and puts it into a context which can be used later.
UPDATE Here is some AWS Documentation showing this being used from the comments below
How can I passthrough the input to a Task state in an AWS Step Functions to the output?
After reading the Input and Output Processing page in the AWS docs, I have played with various combinations of InputPath, ResultPath and OutputPath.
State definition:
"First State": {
"Type": "Task",
"Resource": "[My Lambda ARN]",
"Next": "Second State",
"InputPath": "$.someKey",
"OutputPath": "$"
}
Input:
{
"someKey": "someValue"
}
Expected Result
I would like the output of the First State (and thus the input of Second State) to be
{
"someKey": "someValue"
}
Actual Result
[empty]
What if the input is more complicated, e.g.
{
"firstKey": "firstValue",
"secondKey": "secondValue"
}
I would like to forward all of it without worrying about (sub) paths.
In the Amazon States Language spec it is stated that:
If the value of ResultPath is null, that means that the state’s own raw output is discarded and its raw input becomes its result.
Consequently, I updated my state definition to
"First State": {
"Type": "Task",
"Resource": "[My Lambda ARN]",
"Next": "Second State",
"ResultPath": null
}
As a result, when passing the input example Task input payload will be copied to the output, even for rich objects like:
{
"firstKey": "firstValue",
"secondKey": "secondValue"
}
For those who find themselves here using CDK, the solution is to use the explicit aws_stepfunctions.JsonPath.DISCARD enum rather than None/null.
from aws_cdk import (
aws_stepfunctions,
aws_stepfunctions_tasks,
)
aws_stepfunctions_tasks.LambdaInvoke(
self,
"my_function",
lambda_function=lambda_function,
result_path=aws_stepfunctions.JsonPath.DISCARD,
)
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_aws-stepfunctions.JsonPath.html#static-discard
I was looking for a solution from passing input from one parallel state to another parallel state and the above option worked really good.
For example my step function is like this...tas1->parallel task2 -> parallel trask3 -> task4. So when it start with parallel task3, the input values are wiped out, so ptask3 is failing. With the above option, i was able to pass in same input from ptask2 to ptas3.