Map step lambda function with empty event is running a keyerror - amazon-web-services

I'm writing a step function where lambda A queries a database and creates an array with s3bucket/s3key dictionaries for each one that needs to be processed.
For the processing I have a Map state that takes this s3bucket/s3key and do some processing.
For some reason, the event is coming empty resulting in an error in the Map state execution:
Here is the step function definition
{
"Comment": "A Hello World example of the Amazon States Language using Pass states",
"StartAt": "QueryTable",
"States": {
"QueryTable",": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:...function:query_table:$LATEST"
},
"OutputPath": "$.Payload",
"Next": "ProcessDocuments"
},
"ProcessDocuments": {
"Type": "Map",
"ItemsPath": "$.toprocess",
"MaxConcurrency": 5,
"Iterator": {
"StartAt": "Process",
"States": {
"Process": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:...:...:function:process:$LATEST"
},
"End": true
}
}
},
"End": true
}
}
}
And the error
{
"resourceType": "lambda",
"resource": "invoke",
"error": "KeyError",
"cause": {
"errorMessage": "'s3bucket'",
"errorType": "KeyError",
"stackTrace": [
" File \"/var/task/process.py\", line 260, in lambda_handler\n raise e\n",
" File \"/var/task/process.py\", line 230, in lambda_handler\n s3bucket, s3key = extract_event_data(event)\n",
" File \"/var/task/process.py\", line 125, in extract_event_data\n bucket = record['s3bucket']\n"
]
}
}
I read the ItemsPath and InputPath documentation. Also, used this example to see if I was using it incorrectly but it was no use.

Related

Error in using InputPath to select parts of input in a Step Functions workflow

I am creating a Step Functions workflow which has various steps. I am referring to this topic in their documentation InputPath, ResultPath and OutputPath Examples. I am trying to check the identity and address of a person in my workflow as they've shown in their document. I'm passing the input for the Verify identity step within the state machine definition inside Parameters. My workflow looks like this.
Note: But when I run this, am getting the error -> An error occurred while executing the state 'Verify identity' (entered at the event id #19). Invalid path '$.identity' : Property ['identity'] not found in path $
What am I doing wrong here? Can someone please explain?
Thanks..
{
"StartAt": "Step1",
"States": {
"Step1": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
...something...
},
"Next": "Step2"
},
"Step2": {
"Type": "Choice",
"Choices": [
Do something...
],
"Default": "Step3.1"
},
"Step3.1": {
"Type": "Task",
...something...
}
},
"Next": "Step3.3"
},
...something...,
"Step4": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "Verify identity",
"States": {
"Verify identity": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"InputPath": "$.identity",
"Parameters": {
"Payload": {
"identity": {
"email": "jdoe#example.com",
"ssn": "123-45-6789"
},
"firstName": "Jane",
"lastName": "Doe"
},
"FunctionName": "{Lambda ARN}"
},
"End": true
}
}
},
{
"StartAt": "Verify address",
"States": {
"Verify address": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"Payload": {
"street": "123 Main St",
"city": "Columbus",
"state": "OH",
"zip": "43219"
},
"FunctionName": "{Lambda ARN}"
},
"End": true
}
}
}
],
"Next": "Step5"
},
"Step5": {
"Type": "Task",
"Parameters": {
something...
},
"End": true
}
}```
You don't have an explicit transition in your example to call Step4 but assuming the order you have defined (step1 -> step2 -> step3.1 -> step3.3 -> step4)
This means the output from step3.3 should be something like
{
"cat": "meow",
"dog": "woof",
"identity": { // this is whats missing
"email": "jdoe#example.com",
"ssn": "123-45-6789"
}
}
this is what will get passed to each branch of your parallel state (Step4)
However, since you have anInputPath defined for Step4."Verify identity", the effective input to the task becomes
{
"email": "jdoe#example.com",
"ssn": "123-45-6789"
}
The error youre seeing
An error occurred while executing the state 'Verify identity' (entered at the event id #19). Invalid path '$.identity' : Property ['identity'] not found in path $
means the "identity" key (aka $.identity) isn't getting added to the output of Step3.3 (aka $)

AWS Step Functions Consuming messages from SQS

I am consuming messages from SQS to trigger queries.
When I normally consume a message from SQS in Python, I need to delete the message from SQS.
Do I have to manually delete the message from SQS in a Step Function?
What is the best/simplest way to do so?
I believe SQS has done the integration:
{
"Comment": "Run Redshift Queries",
"StartAt": "ReceiveMessage from SQS",
"States": {
"ReceiveMessage from SQS": {
"Type": "Task",
"Parameters": {
"QueueUrl": "******"
},
"Resource": "arn:aws:states:::aws-sdk:sqs:receiveMessage",
"Next": "Run Analysis Queries",
"ResultSelector": {
"body.$": "States.StringToJson($.Messages[0].Body)"
}
},
"Run Analysis Queries": {
"Type": "Task",
"Parameters": {
"ClusterIdentifier": "******",
"Database": "prod",
"Sql": "select * from ******"
},
"Resource": "arn:aws:states:::aws-sdk:redshiftdata:executeStatement",
"End": true
}
},
"TimeoutSeconds": 3600
}
I just did a test and it seems that the messages goes down temporarily but then goes up again.
Is the best way to insert a Lambda in between the "ReceiveMessage from SQS" stage & Redshift stage?
This raised another question. I have only run this manually. How do I activate this Step Function eventually to run on any message?
If you must use SQS, then you will need to have a lambda function to act as a proxy. You will need to set up the queue as a lambda trigger, and you will need to write a lambda that can parse the SQS message and make the appropriate call to the Step Functions StartExecution API.
After you consume a message, you have to delete it using sqs:deleteMessage. The reason you see it reappear in the queue is because once it's read by an application it becomes hidden for ~30 seconds to avoid other applications process it simultaneously.
Here is an example of how to read, process and delete a message from the queue. Mind that I added MaxNumberOfMessages equals 1 and a ResultPath different than $
"ReceiveMessage from SQS": {
"Type": "Task",
"Parameters": {
"MaxNumberOfMessages": 1,
"QueueUrl": "******"
},
"Resource": "arn:aws:states:::aws-sdk:sqs:receiveMessage",
"Next": "Run Analysis Queries",
"ResultSelector": {
"body.$": "States.StringToJson($.Messages[0].Body)"
}
},
"Run Analysis Queries": {
"Type": "Task",
"Parameters": {
"ClusterIdentifier": "******",
"Database": "prod",
"Sql": "select * from ******"
},
"Resource": "arn:aws:states:::aws-sdk:redshiftdata:executeStatement",
"ResultPath": "$.redshift_output",
"Next": "delete_sqs"
},
"delete_sqs": {
"Comment": "Deletes SQS message",
"Type": "Task",
"Resource": "arn:aws:states:::aws-sdk:sqs:deleteMessage",
"Parameters": {
"ReceiptHandle.$": "$.Messages[0].ReceiptHandle",
"QueueUrl": "******"
},
"ResultPath": null,
"Next": "update_result"
}
Also, you may read up to 10 messages at a time setting MaxNumberOfMessages equals 10 along with a Map step like in this example here:
{
"StartAt": "read_sqs",
"States": {
"read_sqs": {
"Type": "Task",
"Resource": "arn:aws:states:::aws-sdk:sqs:receiveMessage",
"Parameters": {
"MaxNumberOfMessages": 10,
"QueueUrl": "*******"
},
"ResultPath": "$.queueResponse",
"Next": "check_results"
},
"check_results": {
"Comment": "Checking if queue is empty",
"Type": "Choice",
"Choices": [
{
"Variable": "$.queueResponse.Messages[0]",
"IsPresent": true,
"Next": "map_results"
}
],
"Default": "exit"
},
"map_results": {
"Comment": "Performs a 'map' operation over each payload",
"Type": "Map",
"ItemsPath": "$.queueResponse.Messages",
"MaxConcurrency": 10,
"Iterator": {
"StartAt": "read_request",
"States": {
"read_request": {
"Comment": "Parses and moves the request body into the response",
"Type": "Pass",
"Parameters": {
"requestBody.$": "States.StringToJson($.Body)"
},
"ResultPath": "$.map_response",
"Next": "Run Analysis Queries"
},
"Run Analysis Queries": {
"Type": "Task",
"Parameters": {
"ClusterIdentifier": "******",
"Database": "prod",
"Sql": "select * from ******"
},
"Resource": "arn:aws:states:::aws-sdk:redshiftdata:executeStatement",
"ResultPath": "$.redshift_output",
"Next": "delete_sqs"
},
"delete_sqs": {
"Comment": "Deletes SQS message",
"Type": "Task",
"Resource": "arn:aws:states:::aws-sdk:sqs:deleteMessage",
"Parameters": {
"ReceiptHandle.$": "$.ReceiptHandle",
"QueueUrl": "*******"
},
"ResultPath": null,
"End": true
}
}
},
"ResultPath": "$.flowResponse",
"Next": "exit"
},
"exit": {
"Type": "Pass",
"End": true
}
}
}

Break an map loop execution in AWS step functions

I'm trying to build a step function with a loop (Map) inside that can be stopped whenever a specefic Error is thrown, something like this
"Job": {
"Type": "Map",
"InputPath": "$.content",
"ItemsPath": "$.data",
"MaxConcurrency": 0,
"Iterator": {
"StartAt": "Validate",
"States": {
"Validate": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:ship-val",
"Catch": [
{
"ErrorEquals": [
"ErrorOne"
],
"Next": "BreakLoop"
},
{
"ErrorEquals": ["States.ALL"],
"Next": "FailUncaughtError"
}
],
},
"FailUncaughtError":{
"Type": "Fail",
"Error": "Uncaught error"
},
"BreakLoop":{
"Type": "Fail",
"Error": "the loop should be stopped"
}
}
},
"ResultPath": "$.content.data",
"End": true
}
I tried to make the Next element of the Catch to a state outside the Map but I couldn't because the Map accept only states within it. Moreover, AFAIK there is no mention for a feature like this in AWS docs
Instead of catching the error inside the Map state, don't catch it and let the Map state to fail.
And add a catch to Map state and if the error is equal to what you are looking for continue to the next step:
{
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"ItemsPath": "$.array",
"Iterator": {
"StartAt": "FaultyLambda",
"States": {
"FaultyLambda": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "your function arn",
"Payload": {
"a": 1
}
},
"End": true
}
}
},
"Catch": [
{
"ErrorEquals": ["ErrorOne"],
"Next": "BreakLoop"
}
],
"Next": "BreakLoop"
},
"BreakLoop": {
"Type": "Pass",
"End": true
}
}
}
Any other error will not be catched and failed your entire execution.

AWS Step function error : There are Amazon States Language errors in your state machine definition. Fix the errors to continue

I'm new to AWS step functions.
Trying to create a basic ETL flow of glue jobs. Upon completion of state machine definition im able to see the graph being generated , but getting a generic error "There are Amazon States Language errors in your state machine definition. Fix the errors to continue",
error message
that is not allowing me to proceed.
Here is the code and graph :
{
"Comment": "DRC downstream glue jobs execution step function:slf_aws_can_dbisdel_everyone_drc_amp",
"StartAt": "startFlow",
"States": {
"Comment": "various state types of the Amazon States Language",
"startFlow": {
"Comment": "Pass states are useful when constructing and debugging state machines.",
"Type": "Pass",
"Next": "stg_ods"
},
"stg_ods": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "stage_job_name"
},
"Next": "ods_job"
},
"ods_job": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "main_job_name"
},
"Next": "Wait 3 sec"
},
"Wait 3 sec": {
"Comment": "A Wait state delays the state machine from continuing for a specified time.",
"Type": "Wait",
"Seconds": 3,
"Next": "parallel_stg_adr"
},
"parallel_stg_adr": {
"Comment": "A Parallel state can be used to create parallel branches of execution in your state machine.",
"Type": "Parallel",
"Branches": [
{
"StartAt": "stg_job1",
"States": {
"stg_job1": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "stg_job_name1"
},
"End": true
}
}
},
{
"StartAt": "stg_job2",
"States": {
"stg_job2": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "stg_job_name2"
},
"End": true
}
}
}
],
"Next": "parallel_adr_job"
},
"parallel_adr_job": {
"Comment": "A Parallel state can be used to create parallel branches of execution in your state machine.",
"Type": "Parallel",
"Branches": [
{
"StartAt": "job1",
"States": {
"job1": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "some_glue_job",
"Arguments": {
"--target_table": "some_string_table",
"--calendar_key": "some_string"
}
},
"End": true
}
}
},
{
"StartAt": "job2",
"States": {
"job2": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "some_glue_job",
"Arguments": {
"--target_table": "some_string_table",
"--calendar_key": "some_string"
}
},
"End": true
}
}
}
],
"Next": "end_job"
},
"end_job": {
"Type": "Pass",
"End": true
}
}
}
Step function graph
"Comment": "various state types of the Amazon States Language",
This one at Line 5 seems to be incorrect. "States" map cannot have a "Comment" key. Remove it and then try. Rest of the config looks correct.
Edit 1
If the type of Step Function is Express, ".sync" functions won't work. Try changing the ARN to
"Resource": "arn:aws:states:::glue:startJobRun"
and you should be able to save your Step Function. You will then have to figure out how to setup a different Glue task.

Cannot pass array to next task in AWS StepFunction

Working on an AWS StepFunction that gets an array of dates from a Lambda call, then passes to a Task that should take that array as a parameter to pass into a lambda.
The Get Date Range task works fine and outputs the date array:
{
"rng": [
"2019-05-07",
"2019-05-09"
]
}
...and the array gets passed into the ProcessDateRange task, but I cannot assign the array the range Parameter.
It literally tries to pass this: "$.rng" instead of this:
[
"2019-05-07",
"2019-05-09"
]
Here's the StateMachine:
{
"StartAt": "Try",
"States": {
"Try": {
"Type": "Parallel",
"Branches": [{
"StartAt": "Get Date Range",
"States": {
"Get Date Range": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:get-date-range",
"Parameters": {
"name": "thename",
"date_query": "SELECT date from sch.tbl_dates;",
"database": "the_db"
}
,
"ResultPath": "$.rng",
"TimeoutSeconds": 900,
"Next": "ProcessDateRange"
},
"ProcessDateRange": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:process-date-range",
"Parameters": {
"range": "$.rng"
},
"ResultPath": "$",
"Next": "Exit"
},
"Exit": {
"Type": "Succeed"
}
}
}],
"Catch": [{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.Error",
"Next": "Failed"
}],
"Next": "Succeeded"
},
"Failed": {
"Type": "Fail",
"Cause": "There was an error. Please review the logs.",
"Error": "error"
},
"Succeeded": {
"Type": "Succeed"
}
}
}
This is because you are using the wrong syntax for Lambda tasks. To specify the input you need to set the InputPath key, for example:
"ProcessDateRange": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:process-date-range",
"InputPath": "$.rng",
"ResultPath": "$",
"Next": "Exit"
},
If you want a parameter to be interpreted as a JSON path instead of a literal string, add ".$" to the end of the parameter name. To modify your example:
"ProcessDateRange": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:process-date-range",
"Parameters": {
"range.$": "$.rng"
},
"ResultPath": "$",
"Next": "Exit"
},
Relevant docs here: https://docs.aws.amazon.com/step-functions/latest/dg/connectors-parameters.html#connectors-parameters-path