AWS Step function parameters are not moving to the next step - amazon-web-services

I am running a step function with many different step yet I am still stuck on the 2nd step.
The first step is a Java Lambda that gets all the input parameters and does what it needs to do.
The lambda returns null as it doesn't need to return anything.
The next step is a call for API gateway which needs to use one of the parameters in the URL.
However, I see that neither the URL has the needed parameter nor do I actually get the parameters into the step. ("input": null under TaskStateEntered)
The API gateway step looks as follows: (I also tried "Payload.$": "$" instead of the "Input.$": "$")
"API Gateway start": {
"Type": "Task",
"Resource": "arn:aws:states:::apigateway:invoke",
"Parameters": {
"Input.$": "$",
"ApiEndpoint": "aaaaaa.execute-api.aa-aaaa-1.amazonaws.com",
"Method": "GET",
"Headers": {
"Header1": [
"HeaderValue1"
]
},
"Stage": "start",
"Path": "/aaa/aaaa/aaaaa/aaaa/$.scenario",
"QueryParameters": {
"QueryParameter1": [
"QueryParameterValue1"
]
},
"AuthType": "IAM_ROLE"
},
"Next": "aaaaaa"
},
But when my step function gets to this stage it fails and I see the following in the logs:
{
"name": "API Gateway start",
"input": null,
"inputDetails": {
"truncated": false
}
}
And eventually:
{
"error": "States.Runtime",
"cause": "An error occurred while executing the state 'API Gateway start' (entered at the event id #9). Unable to apply Path transformation to null or empty input."
}
What am I missing here? Note that part of the path is a value that I enter at the step function execution. ("Path": "/aaa/aaaa/aaaaa/aaaa/$.scenario")
EDIT:
As requested by #lynkfox, I am adding the lambda definition that comes before the API gateway step:
And to answer the question, yes its standard and I see no input.
"Run tasks": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"OutputPath": "$.Payload",
"Parameters": {
"Payload.$": "$",
"FunctionName": "arn:aws:lambda:aaaaaa-1:12345678910:function:aaaaaaa-aaa:$LATEST"
},
"Retry": [
{
"ErrorEquals": [
"Lambda.ServiceException",
"Lambda.AWSLambdaException",
"Lambda.SdkClientException"
],
"IntervalSeconds": 2,
"MaxAttempts": 6,
"BackoffRate": 2
}
],
"Next": "API Gateway start"
},

So yes, as I commented, I believe the problem is the OutputPath of your lambda task definition. What this is saying is Take whatever comes out of this lambda (which is nothing!) and cut off everything other than the key Payload.
Well you are returning nothing, so this causes nothing to be sent to the next task.
I am assuming your incoming vent already has a key in the Json that is named Payload, so what you want to do is remove the OutputPath from your lambda. It doesn't need to return anything so it doesn't need an Output or Result path.
Next, on your API task, assuming again that your initializing event has a key of Payload, you would have "InputPath": "$.Payload" - if you have your headers or parameters in the initializing json Event then, you can reference those keys in the Parameters section of the definition.
Every AWS Service begins with an Event and ends with an Event. Each Event is a JSON object. (Which I'm sure you know). With State Machines, this continues - the State Machine/Step Function is just the controller for passing Events from one Task to the next.
So any given task can have an InputPath, OutputPath, or Result Path - These three definition parameters can decide what values go into the Task and what are sent onto the Next Task. State machines are, by definition, for maintaining State between Tasks, and these help control that 'State' (and there is pretty much only one 'state' at any given time, the event heading to the next Task(s)
The ResultPath is where, in that overall Event, the task puts the data. If you put ResultPath: "$.MyResult" by itself it appends this key to the incoming event
If you add OutputPath, it ONLY passes that key from the output event of the Task onto the next step in the Step Functions.
These three give you a lot of control.
Want to Take an Event into a Lambda and respond with something completely different - you don't need the incoming data - you combine OutputPath and ResultPath with the same value (and your Lambda needs to respond with a Json Object) then you can replace the event wholesale.
If you have ResultPath of some value and OutputPath: "$." you create a new json object with a single Key that contains the result of your task (the key being the definition set in ResultPath
InputPath allows you to set what goes into the Task. I am not 100% certain but I'm pretty sure it does not remove anything from the next Task in the chain.
More information can be found here but it can get pretty confusing.
My quick guide:
ResultPath by itself if you want to append the data to the event
ResultPath + OutputPath of the same value if you want to cut off the Input and only have the output of the task continue (and it returns a JSON style object)

Related

How to access input of state machine in any node at AWS Step Functions

Let's say I have this state machine in AWS Step Function:
And I had started it with this input:
{
"item1": 1,
"item2": 2,
"item3": 3
}
It's clear for me that Action A is receiving the input payload. But, how can Action C access the state machine input to get the value of item3? Is it possible?
Thanks!!
Typically, the data available in Action C will be dependent on what the result/output of Action B is.
However, if you just care about the original input to the state machine execution, you can set the payload of Action C using the Context Object.
// roughly
"Action C": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"Payload.$": "$$.Execution.Input",
"FunctionName": "<action c lambda>"
},
Check out the AWS documentation for Context Object

AWS Step Functions SendSQSMessage: Dynamic MessageGroupId

I am using the AWS CDK to create a state machine that sends a message to a fifo queue and waits for a callback from the lambda worker to continue execution.
I would like the messages that get sent to the fifo queue to have a dynamic MessageGroupId assigned to them so I can control the number of lambda workers processing the messages. The only way I can think of to have a dynamic MessageGroupId is to reference some parameter on the step function input with JsonPath, however I have not come across any documentation about it. My initial tests to use JsonPath to dynamically pass the MessageGroupId failed, simply passing the string "$.MessageGroupId" effectively giving each message the same message group id and thus one lambda worker.
Is it possible to dynamically assign a message group id to a sqs message when sent from a step function?
If so, how?
With the help AWS Support, I managed to do it by either using the Context Object or passing an ID from the initial input and reference it with $.
Here's an example:
{
"Comment": "Generate unique MessageGroupId",
"StartAt": "Start",
"States": {
"Start": {
"Type": "Task",
"TimeoutSeconds": 60,
"Resource": "arn:aws:states:::sqs:sendMessage.waitForTaskToken",
"Parameters": {
"QueueUrl": "<YOUR_QUEUE_URL>",
"MessageBody": {
"Input.$": "$",
"TaskToken.$": "$$.Task.Token"
},
"MessageGroupId.$": "$$.Execution.Id"
},
"ResultPath": "$",
"End": true
}
}
}
My problem was that I was trying to MessageGroupId like so:
"MessageGroupId": "$$.Execution.Id"
Where I should have done:
"MessageGroupId.$": "$$.Execution.Id"
Appending .$ would resolve the expression "$$.Execution.Id" instead of putting literally the string "$$.Execution.Id".

AWS X-Ray trace segments are missing or not connected

I have code that creates a segment when a queue is being read. In the first function (within the same lambda) I have this:
import * as AWSXRay from 'aws-xray-sdk'; // (using TypeScrpt)
AWSXRay.enableManualMode();
var segment1 = new AWSXRay.Segment("A");
In the second function (within the same lambda), called from the first, I have something like this:
var segment2 = new AWSXRay.Segment("B", segment1.trace_id, segment1.id);
Instead of seeing
*->A->B
On the AWS graph (on the website), I see:
*->A
*->B
...where they are not even associated, even though they have the same tracing ID, and the parent IDs are properly set. I seem to be missing something but not sure what...?
I even tried to pull X-Amzn-Trace-Id from the API request to use that as the root tracking ID for everything but that didn't work either.
This is the JSON for the first segment (A):
{
"Duration": 0.808,
"Id": "1-5d781a08-d41b49e35c3c0f38cdbd4912",
"Segments": [
{
"Document": {
"id": "74c99567f73185ce",
"name": "router",
"start_time": 1568152071.979,
"end_time": 1568152072.787,
"parent_id": "ef34fc0bcf23bbbe",
"aws": {
"xray": {
"sdk": "X-Ray for Node.js",
"sdk_version": "2.3.6",
"package": "aws-xray-sdk"
}
},
"service": {
"version": "unknown",
"runtime": "node",
"runtime_version": "v10.16.3",
"name": "unknown"
},
"trace_id": "1-5d781a08-d41b49e35c3c0f38cdbd4912"
},
"Id": "74c99567f73185ce"
}
]
}
This is the JSON for the second segment (B):
{
"Duration": 0.801,
"Id": "1-5d781a08-d9626abbab1cfbbfe4ff0dff",
"Segments": [
{
"Document": {
"id": "e2b4faaa6538bbb2",
"name": "handleCreateLoad",
"start_time": 1568152071.98,
"end_time": 1568152072.781,
"parent_id": "74c99567f73185ce",
"aws": {
"xray": {
"sdk": "X-Ray for Node.js",
"sdk_version": "2.3.6",
"package": "aws-xray-sdk"
}
},
"service": {
"version": "unknown",
"runtime": "node",
"runtime_version": "v10.16.3",
"name": "unknown"
},
"trace_id": "1-5d781a08-d9626abbab1cfbbfe4ff0dff",
"subsegments": [
{
"id": "08ccf2f374364066",
"name": "...-CreateLoad",
"start_time": 1568152071.981,
"end_time": 1568152072.781
}
]
},
"Id": "e2b4faaa6538bbb2"
}
]
}
It's quite clear the the parent ID for 'B' (74c99567f73185ce) points to "A"'s ID, but the graph does not connect them.
Also, I think _x_amzn_trace_id should be set when the lambda executes, but it is not. That may be root of my issues.
Turns out process.env._x_amzn_trace_id, required by the AWS XRay SDK, does NOT exist until the handler is called. It may help others to know what I went through:
At first I tried to get the trace details for the current lambda on start up (before the handler is called) to connect my new segments, but it didn't work. I have many handlers in the same project, so getting the lambda segment on startup is what I was hoping to do.
I then proceeded to create a main lambda segment (thinking I had to create the first segment myself) but all it did was create an orphaned segment. To make matters worse, each segment creates a new trace ID if one is not provided, and since I could not get the trace ID from the global start-up scope, nothing was connecting. The proper trace ID is important to pass along from start to finish for each request to make sure the calls down-stream are tracked properly.
Dumping of the environment variables before the handler is called and after clearly showed the trace ID is not provided until just before the handler gets called. It's sad that most of the online examples don't even bother to warn about this. I then moved the called to AWSXRay.getSegment() at the start of the lambda handler, then passed the details onto the child segments.
DO NOT set context.callbackWaitsForEmptyEventLoop = false while also calling the callback(error, response) callback passed to the lambda handler. Doing so will terminate the lambda without waiting for segment update events to flush to the daemon, resulting in orphaned segments. :(
Note: This documentation is lacking: https://docs.aws.amazon.com/xray-sdk-for-nodejs/latest/reference/
It states "You can retrieve the current segment or subsegment at any time" when in fact there are some times when you cannot. It's too bad there are no proper examples using actual working NodeJS Lambda code, instead of isolated lines of code thrown everywhere.

Passthrough input to output in AWS Step Functions

How can I passthrough the input to a Task state in an AWS Step Functions to the output?
After reading the Input and Output Processing page in the AWS docs, I have played with various combinations of InputPath, ResultPath and OutputPath.
State definition:
"First State": {
"Type": "Task",
"Resource": "[My Lambda ARN]",
"Next": "Second State",
"InputPath": "$.someKey",
"OutputPath": "$"
}
Input:
{
"someKey": "someValue"
}
Expected Result
I would like the output of the First State (and thus the input of Second State) to be
{
"someKey": "someValue"
}
Actual Result
[empty]
What if the input is more complicated, e.g.
{
"firstKey": "firstValue",
"secondKey": "secondValue"
}
I would like to forward all of it without worrying about (sub) paths.
In the Amazon States Language spec it is stated that:
If the value of ResultPath is null, that means that the state’s own raw output is discarded and its raw input becomes its result.
Consequently, I updated my state definition to
"First State": {
"Type": "Task",
"Resource": "[My Lambda ARN]",
"Next": "Second State",
"ResultPath": null
}
As a result, when passing the input example Task input payload will be copied to the output, even for rich objects like:
{
"firstKey": "firstValue",
"secondKey": "secondValue"
}
For those who find themselves here using CDK, the solution is to use the explicit aws_stepfunctions.JsonPath.DISCARD enum rather than None/null.
from aws_cdk import (
aws_stepfunctions,
aws_stepfunctions_tasks,
)
aws_stepfunctions_tasks.LambdaInvoke(
self,
"my_function",
lambda_function=lambda_function,
result_path=aws_stepfunctions.JsonPath.DISCARD,
)
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_aws-stepfunctions.JsonPath.html#static-discard
I was looking for a solution from passing input from one parallel state to another parallel state and the above option worked really good.
For example my step function is like this...tas1->parallel task2 -> parallel trask3 -> task4. So when it start with parallel task3, the input values are wiped out, so ptask3 is failing. With the above option, i was able to pass in same input from ptask2 to ptas3.

How to specify multiple result path values in AWS Step Functions

I have a AWS Step Function State formatted as follows:
"MyState": {
"Type": "Task",
"Resource": "<MyLambdaARN>",
"ResultPath": "$.value1"
"Next": "NextState"
}
I want to add a second value but can't find out how anywhere. None of the AWS examples display multiple ResultPath values being added to the output.
Would I just add a comma between them?
"MyState": {
"Type": "Task",
"Resource": "<MyLambdaARN>",
"ResultPath": "$.value1, $.value2"
"Next": "NextState"
}
Or is there a better way to format these?
Let's answer this straight up: you can't specify multiple ResultPath values, because it doesn't make sense. Amazon does do a pretty bad job of explaining how this works, so I understand why this is confusing.
You can, however, return multiple result values from a State in your State Machine.
General Details
The input to any State is a JSON object. The output of the State is a JSON object.
ResultPath directs the State Machine what to do with the output (result) of the State. Without specifying ResultPath, it defaults to $ which means all the input to the State is lost, replaced by the output of the State.
If you want to allow data from the input JSON to pass through your State, you specify a ResultPath to describe a property to add/overwrite on the input JSON to pass to the next State.
Your scenario
In your case, $.value1 means the output JSON of your State is the input JSON with a new/overwritten property value1 containing the output JSON of your lambda.
If you want multiple values in your output, your lambda should return a JSON object containing the multiple values, which will be the value of the value1 property.
If you don't care about allowing input values passing through your State, leave the ResultPath as the default $ by omitting it. The output JSON containing your multiple values will be the input to the next State.
Support scenario
Here's a simple State machine I use to play with the inputs and outputs:
{
"StartAt": "State1",
"States": {
"State1": {
"Type": "Pass",
"Result": { "Value1": "Yoyo", "Value2": 1 },
"ResultPath": "$.Result",
"Next": "State2"
},
"State2": {
"Type": "Pass",
"Result": { "Value2": 5 },
"ResultPath": "$.Result",
"Next": "State3"
},
"State3": {
"Type": "Pass",
"Result": "Done",
"End": true
}
}
}
Execute this with the following input:
{
"Input 1": 10000,
"Input 2": "YOLO",
"Input 3": true
}
Examine the inputs and outputs of each Stage. You should observe the following:
The input is passed all the way through, because the ResultPath always directs output to a Result property of the input.
The output of State1 is overwritten by the Output of State2. The net effect is Result.Value1 disappears and Result.Value2 is "updated".
Hopefully this clarifies how to use ResultPath effectively.
You cannot specify several values in ResultPath, because ResultPath defines the path of your result value in the json. The close analogy for ResultPath is a return value of a function, as your step can return only 1 value it should be put into 1 node in the resulting json.
If you have an input json
{
"myValue": "value1",
"myArray": [1,2,3]
}
And define your ResultPath as $.myResult the overall resulting json will be
{
"myValue": "value1",
"myArray": [1,2,3],
"myResult": "result"
}
Now you can truncate this json to pass only part of it to the next step in your function using OutputPath (e.g. OutputPath: "$.myResult")
InputPath and OutputPath can have several nodes in their definition, but ResultPath should always have only 1 node.
Seems like today it can be done by using ResultsSelector.
I think it would be like the following, but I haven't actually done this so I can't say for sure.
"ResultPath": {
"var1" : "$.value1",
"var2" : "$.value2"
},
=====
After looking further into this, I am convinced that there is no direct way to do what you want to do. Here is a way that could give you the results that you want.
1) You would omit the InputPath, OutputPath, and ResultPath from your step. This would mean that all of $. would be passed in as an input to your step function and that all of the output from the lambda function would be stored as $. In the lambda function you could set the results fields to be whatever you want them to be. The lambda function must return the modified input as it output.