How can I passthrough the input to a Task state in an AWS Step Functions to the output?
After reading the Input and Output Processing page in the AWS docs, I have played with various combinations of InputPath, ResultPath and OutputPath.
State definition:
"First State": {
"Type": "Task",
"Resource": "[My Lambda ARN]",
"Next": "Second State",
"InputPath": "$.someKey",
"OutputPath": "$"
}
Input:
{
"someKey": "someValue"
}
Expected Result
I would like the output of the First State (and thus the input of Second State) to be
{
"someKey": "someValue"
}
Actual Result
[empty]
What if the input is more complicated, e.g.
{
"firstKey": "firstValue",
"secondKey": "secondValue"
}
I would like to forward all of it without worrying about (sub) paths.
In the Amazon States Language spec it is stated that:
If the value of ResultPath is null, that means that the state’s own raw output is discarded and its raw input becomes its result.
Consequently, I updated my state definition to
"First State": {
"Type": "Task",
"Resource": "[My Lambda ARN]",
"Next": "Second State",
"ResultPath": null
}
As a result, when passing the input example Task input payload will be copied to the output, even for rich objects like:
{
"firstKey": "firstValue",
"secondKey": "secondValue"
}
For those who find themselves here using CDK, the solution is to use the explicit aws_stepfunctions.JsonPath.DISCARD enum rather than None/null.
from aws_cdk import (
aws_stepfunctions,
aws_stepfunctions_tasks,
)
aws_stepfunctions_tasks.LambdaInvoke(
self,
"my_function",
lambda_function=lambda_function,
result_path=aws_stepfunctions.JsonPath.DISCARD,
)
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_aws-stepfunctions.JsonPath.html#static-discard
I was looking for a solution from passing input from one parallel state to another parallel state and the above option worked really good.
For example my step function is like this...tas1->parallel task2 -> parallel trask3 -> task4. So when it start with parallel task3, the input values are wiped out, so ptask3 is failing. With the above option, i was able to pass in same input from ptask2 to ptas3.
Related
I am running a step function with many different step yet I am still stuck on the 2nd step.
The first step is a Java Lambda that gets all the input parameters and does what it needs to do.
The lambda returns null as it doesn't need to return anything.
The next step is a call for API gateway which needs to use one of the parameters in the URL.
However, I see that neither the URL has the needed parameter nor do I actually get the parameters into the step. ("input": null under TaskStateEntered)
The API gateway step looks as follows: (I also tried "Payload.$": "$" instead of the "Input.$": "$")
"API Gateway start": {
"Type": "Task",
"Resource": "arn:aws:states:::apigateway:invoke",
"Parameters": {
"Input.$": "$",
"ApiEndpoint": "aaaaaa.execute-api.aa-aaaa-1.amazonaws.com",
"Method": "GET",
"Headers": {
"Header1": [
"HeaderValue1"
]
},
"Stage": "start",
"Path": "/aaa/aaaa/aaaaa/aaaa/$.scenario",
"QueryParameters": {
"QueryParameter1": [
"QueryParameterValue1"
]
},
"AuthType": "IAM_ROLE"
},
"Next": "aaaaaa"
},
But when my step function gets to this stage it fails and I see the following in the logs:
{
"name": "API Gateway start",
"input": null,
"inputDetails": {
"truncated": false
}
}
And eventually:
{
"error": "States.Runtime",
"cause": "An error occurred while executing the state 'API Gateway start' (entered at the event id #9). Unable to apply Path transformation to null or empty input."
}
What am I missing here? Note that part of the path is a value that I enter at the step function execution. ("Path": "/aaa/aaaa/aaaaa/aaaa/$.scenario")
EDIT:
As requested by #lynkfox, I am adding the lambda definition that comes before the API gateway step:
And to answer the question, yes its standard and I see no input.
"Run tasks": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"OutputPath": "$.Payload",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:aaaaaa-1:12345678910:function:aaaaaaa-aaa:$LATEST"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Next": "API Gateway start"
},
So yes, as I commented, I believe the problem is the OutputPath of your lambda task definition. What this is saying is Take whatever comes out of this lambda (which is nothing!) and cut off everything other than the key Payload.
Well you are returning nothing, so this causes nothing to be sent to the next task.
I am assuming your incoming vent already has a key in the Json that is named Payload, so what you want to do is remove the OutputPath from your lambda. It doesn't need to return anything so it doesn't need an Output or Result path.
Next, on your API task, assuming again that your initializing event has a key of Payload, you would have "InputPath": "$.Payload" - if you have your headers or parameters in the initializing json Event then, you can reference those keys in the Parameters section of the definition.
Every AWS Service begins with an Event and ends with an Event. Each Event is a JSON object. (Which I'm sure you know). With State Machines, this continues - the State Machine/Step Function is just the controller for passing Events from one Task to the next.
So any given task can have an InputPath, OutputPath, or Result Path - These three definition parameters can decide what values go into the Task and what are sent onto the Next Task. State machines are, by definition, for maintaining State between Tasks, and these help control that 'State' (and there is pretty much only one 'state' at any given time, the event heading to the next Task(s)
The ResultPath is where, in that overall Event, the task puts the data. If you put ResultPath: "$.MyResult" by itself it appends this key to the incoming event
If you add OutputPath, it ONLY passes that key from the output event of the Task onto the next step in the Step Functions.
These three give you a lot of control.
Want to Take an Event into a Lambda and respond with something completely different - you don't need the incoming data - you combine OutputPath and ResultPath with the same value (and your Lambda needs to respond with a Json Object) then you can replace the event wholesale.
If you have ResultPath of some value and OutputPath: "$." you create a new json object with a single Key that contains the result of your task (the key being the definition set in ResultPath
InputPath allows you to set what goes into the Task. I am not 100% certain but I'm pretty sure it does not remove anything from the next Task in the chain.
More information can be found here but it can get pretty confusing.
My quick guide:
ResultPath by itself if you want to append the data to the event
ResultPath + OutputPath of the same value if you want to cut off the Input and only have the output of the task continue (and it returns a JSON style object)
Basically, i have the following input:
{
"name": "abc",
"choice": "choice1"
}
My dynamoDB table has the following structure:
Partition key - "name"
Complex json with choices:
{
"choices":
{
"choice1": ......,
"choice2": ......
}
}
I want to directly read from dynamodb, and get a subitem under the relevant choice:
{
"StartAt": "Read Next Message from DynamoDB",
"States": {
"Read Next Message from DynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:getItem",
"Parameters": {
"TableName": "my_table",
"Key": {
"customerName": {"S.$": "$.name"}
}
},
"OutputPath": "$.Item.choices.M.choice1.M.myvalue.S",
"Next": "World"
},
"World": {
"Type": "Pass",
"End": true
}
}
}
basically i want to do something like "$.Item.choices.M.{$.choice}.M.myvalue.S", and take one of the output's keys from the input. is this possible?
I think what you're looking for is JsonPath interpolation, but that is not supported as per this thread on AWS forums.
As far as I know Step Functions allow only path reference through $, . and [] operators (Reference Path).
I don't know how much control you have on the DynamoDB table's data but I think your problem can be solved easily if your choice types are modeled in following way
{
"choices": [{
"choiceType": "choice1",
........
},
{
"choiceType": "choice2",
........
}]
}
Now you can use the map state to iterate over the choices array. Note that don't forget to pass the expected choiceType to each iteration.
First state of the map iterator can be a choice state which compares choiceType and moves to appropriate next state. So, basically your rest of the workflow is modeled as iterator of the map state in step 1.
Now, if you don't have the control over DynamoDB table, then you can process the query result in an AWS Lambda.
I am trying to use AWS Step Functions to trigger operations many S3 files via Lambda. To do this I am invoking a step function with an input that has a base S3 key of the file and part numbers each file (each parallel iteration would operate on a different S3 file). The input looks something like
{
"job-spec": {
"base_file_name": "some_s3_key-",
"part_array": [
"part-0000.tsv",
"part-0001.tsv",
"part-0002.tsv", ...
]
}
}
My Step function is very simple, takes that input and maps it out, however I can't seem to get both the file and the array as input to my lambda. Here is my step function definition
{
"Comment": "An example of the Amazon States Language using a map state to process elements of an array with a max concurrency of 2.",
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"ItemsPath": "$.job-spec",
"ResultPath": "$.part_array",
"MaxConcurrency": 2,
"Next": "Final State",
"Iterator": {
"StartAt": "My Stage",
"States": {
"My Stage": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-east-1:<>:function:some-lambda:$LATEST",
"Payload": {
"Input.$": "$.part_array"
}
},
"End": true
}
}
}
},
"Final State": {
"Type": "Pass",
"End": true
}
}
}
As written above it complains that that job-spec is not an array for the ItemsPath. If I change that to $.job-spec.array I get the array I'm looking for in my lambda but the base key is missing.
Essentially I want each python lambda to get the base file key and one entry from the array to stitch together the complete file name. I can't just put the complete file names in the array due to the limit limit of how much data I can pass around in Step Functions and that also seems like a waste of data
It looks like the Parameters value can be used for this but I can't quite get the syntax right
Was able to finally get the syntax right.
"ItemsPath": "$.job-spec.part_array",
"Parameters": {
"part_name.$": "$$.Map.Item.Value",
"base_file_name.$": "$.job-spec.base_file_name"
},
It seems that Parameters can be used to create custom inputs for each stage. The $$ is accessing the context of the stage and not the actual input. It appears that ItemsPath takes the array and puts it into a context which can be used later.
UPDATE Here is some AWS Documentation showing this being used from the comments below
How can I pass my input to my output in a task in AWS Step Functions?
I'm aware of this question, and the docs:
If the value of ResultPath is null, that means that the state’s own raw output is discarded and its raw input becomes its result.
But what I need is:
Given my input
{
"input": "my_input"
}
And my lambda output
{
"output": "my_output"
}
I need to pass to the next state the following json:
{
"input": "my_input",
"output": "my_output"
}
Two suggestions comes to mind, either Use ResultPath to Replace the Input with the Result, which allows you to
If you don't specify a ResultPath, the default behavior is as if you had specified "ResultPath": "$". Because this tells the state to replace the entire input with the result, the state input is completely replaced by the result coming from the task result.
For this option to work, the Lambda function must return the desired response:
{
"input": "my_input",
"output": "my_output"
}
Alternatively Use ResultPath to Include the Result with the Input in the Step Functions developer guide. Next, if if you change the return value from you Lambda to include just "my_output" you can specify "ResultPath": "$.output" to achieve the desired result:
{
"input": "my_input",
"output": "my_output"
}
For anyone who comes to this question, it's not possible to do what I asked, but what you can do is, according to the docs:
{
"States": {
"my-state": {
"Type": "task",
"ResultPath": "$.response"
}
}
}
This will append whatever the lambda returns into the response node, resulting in:
{
"input": "my_input",
"response": {
"output": "my_output"
}
}
For anyone coming to this question:
It is possible to do this using ResultSelector and Intrinsic Functions and Context Object.
There is no need to change output.
With input given as:
{ "input_key": "input_value" }
and state output given as:
{ "output_key": "output_value" }
then in the state add the following fields:
the $.Payload field comes from lambda's output, if the state is not a lambda call, then check the output to merge the proper key
"ResultSelector": {"response.$": "States.JsonMerge($.Payload,
$$.Execution.Input, false)"}, "OutputPath": "$.response"
as a result the task's output will contain the input and output merged flat, without some weird nested dictionaries, exactly as requested
I have a AWS Step Function State formatted as follows:
"MyState": {
"Type": "Task",
"Resource": "<MyLambdaARN>",
"ResultPath": "$.value1"
"Next": "NextState"
}
I want to add a second value but can't find out how anywhere. None of the AWS examples display multiple ResultPath values being added to the output.
Would I just add a comma between them?
"MyState": {
"Type": "Task",
"Resource": "<MyLambdaARN>",
"ResultPath": "$.value1, $.value2"
"Next": "NextState"
}
Or is there a better way to format these?
Let's answer this straight up: you can't specify multiple ResultPath values, because it doesn't make sense. Amazon does do a pretty bad job of explaining how this works, so I understand why this is confusing.
You can, however, return multiple result values from a State in your State Machine.
General Details
The input to any State is a JSON object. The output of the State is a JSON object.
ResultPath directs the State Machine what to do with the output (result) of the State. Without specifying ResultPath, it defaults to $ which means all the input to the State is lost, replaced by the output of the State.
If you want to allow data from the input JSON to pass through your State, you specify a ResultPath to describe a property to add/overwrite on the input JSON to pass to the next State.
Your scenario
In your case, $.value1 means the output JSON of your State is the input JSON with a new/overwritten property value1 containing the output JSON of your lambda.
If you want multiple values in your output, your lambda should return a JSON object containing the multiple values, which will be the value of the value1 property.
If you don't care about allowing input values passing through your State, leave the ResultPath as the default $ by omitting it. The output JSON containing your multiple values will be the input to the next State.
Support scenario
Here's a simple State machine I use to play with the inputs and outputs:
{
"StartAt": "State1",
"States": {
"State1": {
"Type": "Pass",
"Result": { "Value1": "Yoyo", "Value2": 1 },
"ResultPath": "$.Result",
"Next": "State2"
},
"State2": {
"Type": "Pass",
"Result": { "Value2": 5 },
"ResultPath": "$.Result",
"Next": "State3"
},
"State3": {
"Type": "Pass",
"Result": "Done",
"End": true
}
}
}
Execute this with the following input:
{
"Input 1": 10000,
"Input 2": "YOLO",
"Input 3": true
}
Examine the inputs and outputs of each Stage. You should observe the following:
The input is passed all the way through, because the ResultPath always directs output to a Result property of the input.
The output of State1 is overwritten by the Output of State2. The net effect is Result.Value1 disappears and Result.Value2 is "updated".
Hopefully this clarifies how to use ResultPath effectively.
You cannot specify several values in ResultPath, because ResultPath defines the path of your result value in the json. The close analogy for ResultPath is a return value of a function, as your step can return only 1 value it should be put into 1 node in the resulting json.
If you have an input json
{
"myValue": "value1",
"myArray": [1,2,3]
}
And define your ResultPath as $.myResult the overall resulting json will be
{
"myValue": "value1",
"myArray": [1,2,3],
"myResult": "result"
}
Now you can truncate this json to pass only part of it to the next step in your function using OutputPath (e.g. OutputPath: "$.myResult")
InputPath and OutputPath can have several nodes in their definition, but ResultPath should always have only 1 node.
Seems like today it can be done by using ResultsSelector.
I think it would be like the following, but I haven't actually done this so I can't say for sure.
"ResultPath": {
"var1" : "$.value1",
"var2" : "$.value2"
},
=====
After looking further into this, I am convinced that there is no direct way to do what you want to do. Here is a way that could give you the results that you want.
1) You would omit the InputPath, OutputPath, and ResultPath from your step. This would mean that all of $. would be passed in as an input to your step function and that all of the output from the lambda function would be stored as $. In the lambda function you could set the results fields to be whatever you want them to be. The lambda function must return the modified input as it output.