AWS Step and Batch Dynamic Command - amazon-web-services

I have a batch Job with a single Job Definition that executes depending on a parameter on the environment command option.
The original value is "--param2=XXX" but I need this to be dynamic according to the Input parameters of the Step Functions.
{
"param2": "--param2=YYY"
}
I haven't been able to replace the value in the Step Function with the Input Value
{
"Step1": {
"Type": "Task",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"JobDefinition": "arn:aws:batch:us-east-2:zzzzzzzzz:job-definition/XXXXXX",
"JobQueue": "arn:aws:batch:us-east-2:zzzzzzzz:job-queue/YYYYYY",
"JobName": "Step1",
"ContainerOverrides": {
"Environment": [
{
"Name": "envparam",
"Value": "0"
}
],
"Command": [
"python",
"run.py",
"--param=val",
"$.param2"
]
}
},
"Next": "Step2"
}
}

I found a solution adding a Parameter to Batch and it is referenced using Ref::Param2
This is the complete code
{
"Step1": {
"Type": "Task",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"JobDefinition": "arn:aws:batch:us-east-2:zzzzzzzzz:job-definition/XXXXXX",
"JobQueue": "arn:aws:batch:us-east-2:zzzzzzzz:job-queue/YYYYYY",
"JobName": "Step1",
"Parameters": {
"Param2.$": "$.param2"
},
"ContainerOverrides": {
"Environment": [
{
"Name": "envparam",
"Value": "0"
}
],
"Command": [
"python",
"run.py",
"--param=val",
"Ref::Param2"
]
}
},
"Next": "Step2"
}
}

Related

Map executing another step function does not pass any input

Can't figure this one out : the step function executed inside a map ALWAYS has an empty dict in input. I need the input to be {"id": "xxxx"}
Here is the map element :
"Map": {
"Type": "Map",
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "INLINE"
},
"StartAt": "InsideStep",
"States": {
"InsideStep": {
"Type": "Task",
"Resource": "arn:aws:states:::states:startExecution.sync:2",
"Parameters": {
"StateMachineArn": "arn:aws:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
},
"End": true
}
}
},
"Next": "Consolidate logs",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Log error",
"ResultPath": "$.error"
}
],
"ItemsPath": "$.ids_list",
"MaxConcurrency": 10,
"Parameters": {
"id.$": "$$.Map.Item.Value"
}
}
Thanks for any help
Here is the working solution, thanks to luk2302 suggestions
"Map": {
"Type": "Map",
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "INLINE"
},
"StartAt": "InsideStep",
"States": {
"InsideStep": {
"Type": "Task",
"Resource": "arn:aws:states:::states:startExecution.sync:2",
"Parameters": {
"StateMachineArn": "arn:aws:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
"Input": {
"id.$": "$"
}
},
"End": true
}
}
},
"Next": "Consolidate logs",
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Log error",
"ResultPath": "$.error"
}
],
"ItemsPath": "$.ids_list",
"MaxConcurrency": 10
}

AWS Step Functions: Input Parameter

I have a very simple workflow using Fargate containers. The containers simply return inputs.
Workflow Input is:
{
"value": "FALSE"
}
Choice operates as expected. $.value is processed as FALSE and choice runs the ECS RunTask-FALSE container.
The container should receive and return the string "FALSE", however it returns the string "*.value". For some reason the Input value is not being parsed. When I look at the Task run in ECS portal I see:
How can I pass the starting parameter to this Task ?
{
"Comment": "A description of my state machine",
"StartAt": "Choice",
"States": {
"Choice": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.value",
"StringEquals": "FALSE",
"Next": "ECS RunTask-FALSE"
}
],
"Default": "ECS RunTask-TRUE"
},
"ECS RunTask-FALSE": {
"Type": "Task",
"Resource": "arn:aws:states:::ecs:runTask.sync",
"Parameters": {
"LaunchType": "FARGATE",
"Cluster": "arn:aws:ecs:us-east-2:xxxxxxxxxxx:cluster/portal",
"TaskDefinition": "arn:aws:ecs:us-east-2:xxxxxxxxxxx:task-definition/simple:4",
"Overrides": {
"ContainerOverrides": [
{
"Name": "simple",
"Command": ["$.value"]
}
]
},
"NetworkConfiguration": {
"AwsvpcConfiguration": {
"Subnets": [
"subnet-001f85595e8af43cf",
"subnet-05e742358ac59ae04"
],
"SecurityGroups": [
"sg-0948e5328861ae667"
],
"AssignPublicIp": "ENABLED"
}
}
},
"End": true
},
"ECS RunTask-TRUE": {
"Type": "Task",
"Resource": "arn:aws:states:::ecs:runTask.sync",
"Parameters": {
"LaunchType": "FARGATE",
"Cluster": "arn:aws:ecs:us-east-2:xxxxxxxxxxx:cluster/portal",
"TaskDefinition": "arn:aws:ecs:us-east-2:xxxxxxxxxxx:task-definition/simple:4",
"NetworkConfiguration": {
"AwsvpcConfiguration": {
"Subnets": [
"subnet-001f85595e8af43cf",
"subnet-05e742358ac59ae04"
],
"SecurityGroups": [
"sg-0948e5328861ae667"
],
"AssignPublicIp": "ENABLED"
}
}
},
"End": true
}
}
}
"Command.$": "States.Array($.value)"
Parameters: For key-value pairs where the value is selected using a path, the key name must end in .$.
States.Array Intrinsic Function: The interpreter returns a JSON array containing the values of the arguments in the order provided.

Accessing a stage inside the parallel block in an AWS Step function from another branch

I have created a step function as you can see in the picture. Now I need to execute StepX after StepK (And then the ChoiceA flow will end). So basically StepX should be executed in parallel with StepY->StepZ as it is now but also be executed after StepK. But I cannot find a way to access a stage which is inside a parallel block". Is there a way around this?
Here is my Json-
{
"StartAt": "DataPointsExtractor",
"States": {
"DataPointsExtractor": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"Next": "PathDecider"
},
"PathDecider": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.path_type",
"StringEquals": "ChoiceA",
"Next": "ChoiceA"
},
{
"Variable": "$.path_type",
"StringEquals": "ChoiceB",
"Next": "ChoiceB"
}
],
"Default": "NoMatchesState"
},
"ChoiceA": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"Next": "StepK"
},
"StepK": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"End": true
},
"ChoiceB": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"Next": "ParallelStates"
},
"ParallelStates": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "StepX",
"States": {
"StepX": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"End": true
}
}
},
{
"StartAt": "StepY",
"States": {
"StepY": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"Next": "StepZ"
},
"StepZ": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"End": true
}
}
}
],
"End": true
},
"NoMatchesState": {
"Type": "Fail",
"Cause": "No Matches!"
}
}
}
You should keep it simple. As ChoiceA and ChoiceB are separate flows, they don't need to intersect. StepX can be used twice (you will have to use different name for it though)
Definition:
{
"StartAt": "DataPointsExtractor",
"States": {
"DataPointsExtractor": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"Next": "PathDecider"
},
"PathDecider": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.path_type",
"StringEquals": "ChoiceA",
"Next": "ChoiceA"
},
{
"Variable": "$.path_type",
"StringEquals": "ChoiceB",
"Next": "ChoiceB"
}
],
"Default": "NoMatchesState"
},
"ChoiceA": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"Next": "StepK"
},
"StepK": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"Next": "StepX"
},
"StepX": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"End": true
},
"ChoiceB": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"Next": "ParallelStates"
},
"ParallelStates": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "StepX",
"States": {
"StepX": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"End": true
}
}
},
{
"StartAt": "StepY",
"States": {
"StepY": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"Next": "StepZ"
},
"StepZ": {
"Type": "Task",
"Resource": "arn:aws:lambda:*******",
"End": true
}
}
}
],
"End": true
},
"NoMatchesState": {
"Type": "Fail",
"Cause": "No Matches!"
}
}
}

Is it possible to execute Step Concurrency for AWS EMR through AWS STEP FUNCTION without Lambda?

This is my Scenario, I'm trying to create 4 AWS EMR clusters, where each cluster will be assigned with 2 jobs in it, so it'll be like 4 clusters with 8 jobs orchestrated using Step Function.
My Flow should be like:
4 Clusters will start at the same time running 8 jobs parallelly, where each cluster will run 2 jobs parallelly.
Now, recently AWS has launched this feature to run 2 (or) more jobs in a single cluster simultaneously using StepConcurrencyLevel in EMR to reduce the runtime of the cluster, which can be performed using EMR console, AWS CLI (or) even through AWS lambda.
But, I want to execute this process of launching 2 (or) more jobs parallelly in a single cluster using AWS Step Function with it's state machine language like the format referred here https://docs.aws.amazon.com/step-functions/latest/dg/connect-emr.html
I've tried referring many sites to execute this process, where I'm getting solution for doing it through the console (or) through boto3 format in AWS lambda, but I couldn't find the solution on executing this through Step Function itself...
Is there any Solution for this!?
Thanks in Advance..
So, I went through few more sites and found a solution for my issue...
The issue I faced was StepConcurrencyLevel, where I can add it using AWS Console (or) through AWS CLI (or) even through Python using BOTO3... But I was expecting a solution using State Machine Language and I found one...
All we have to do is while creating our cluster using the State Machine Language we have to specify the StepConcurrencyLevel in it like 2 (or) 3, where the default is 1. Once it's been set then create 4 steps under that cluster and run the State Machine.
Where the cluster will recognize the number of concurrency been set and will run the Steps accordingly.
My Sample Process:
-> JSON Script of my orchestration
{
"StartAt": "Create_A_Cluster",
"States": {
"Create_A_Cluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "WorkflowCluster",
"StepConcurrencyLevel": 2,
"Tags": [
{
"Key": "Description",
"Value": "process"
},
{
"Key": "Name",
"Value": "filename"
},
{
"Key": "Owner",
"Value": "owner"
},
{
"Key": "Project",
"Value": "roject"
},
{
"Key": "User",
"Value": "user"
}
],
"VisibleToAllUsers": true,
"ReleaseLabel": "emr-5.28.1",
"Applications": [
{
"Name": "Spark"
}
],
"ServiceRole": "EMR_DefaultRole",
"JobFlowRole": "EMR_EC2_DefaultRole",
"LogUri": "s3://prefix/prefix/log.txt/",
"Instances": {
"KeepJobFlowAliveWhenNoSteps": true,
"InstanceFleets": [
{
"InstanceFleetType": "MASTER",
"TargetSpotCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m4.xlarge",
"BidPriceAsPercentageOfOnDemandPrice": 90
}
]
},
{
"InstanceFleetType": "CORE",
"TargetSpotCapacity": 1,
"InstanceTypeConfigs": [
{
"InstanceType": "m4.xlarge",
"BidPriceAsPercentageOfOnDemandPrice": 90
}
]
}
]
}
},
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"IntervalSeconds": 5,
"MaxAttempts": 1,
"BackoffRate": 2.5
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "Fail_Cluster"
}
],
"ResultPath": "$.cluster",
"OutputPath": "$.cluster",
"Next": "Add_Steps_Parallel"
},
"Fail_Cluster": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"TopicArn": "arn:aws:sns:us-west-2:919490798061:rsac_error_notification",
"Message.$": "$.Cause"
},
"Next": "Terminate_Cluster"
},
"Add_Steps_Parallel": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "Step_One",
"States": {
"Step_One": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId",
"Step": {
"Name": "The first step",
"ActionOnFailure": "TERMINATE_CLUSTER",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"cluster",
"--master",
"yarn",
"--conf",
"spark.dynamicAllocation.enabled=true",
"--conf",
"maximizeResourceAllocation=true",
"--conf",
"spark.shuffle.service.enabled=true",
"--py-files",
"s3://prefix/prefix/pythonfile.py",
"s3://prefix/prefix/pythonfile.py"
]
}
}
},
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"IntervalSeconds": 5,
"MaxAttempts": 1,
"BackoffRate": 2.5
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.err_mgs",
"Next": "Fail_SNS"
}
],
"ResultPath": "$.step1",
"Next": "Terminate_Cluster_1"
},
"Fail_SNS": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"TopicArn": "arn:aws:sns:us-west-2:919490798061:rsac_error_notification",
"Message.$": "$.err_mgs.Cause"
},
"ResultPath": "$.fail_cluster",
"Next": "Terminate_Cluster_1"
},
"Terminate_Cluster_1": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:terminateCluster.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId"
},
"End": true
}
}
},
{
"StartAt": "Step_Two",
"States": {
"Step_Two": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep",
"Parameters": {
"ClusterId.$": "$.ClusterId",
"Step": {
"Name": "The second step",
"ActionOnFailure": "TERMINATE_CLUSTER",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"cluster",
"--master",
"yarn",
"--conf",
"spark.dynamicAllocation.enabled=true",
"--conf",
"maximizeResourceAllocation=true",
"--conf",
"spark.shuffle.service.enabled=true",
"--py-files",
"s3://prefix/prefix/pythonfile.py",
"s3://prefix/prefix/pythonfile.py"
]
}
}
},
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"IntervalSeconds": 5,
"MaxAttempts": 1,
"BackoffRate": 2.5
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"ResultPath": "$.err_mgs_1",
"Next": "Fail_SNS_1"
}
],
"ResultPath": "$.step2",
"Next": "Terminate_Cluster_2"
},
"Fail_SNS_1": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"TopicArn": "arn:aws:sns:us-west-2:919490798061:rsac_error_notification",
"Message.$": "$.err_mgs_1.Cause"
},
"ResultPath": "$.fail_cluster_1",
"Next": "Terminate_Cluster_2"
},
"Terminate_Cluster_2": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:terminateCluster.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId"
},
"End": true
}
}
}
],
"ResultPath": "$.steps",
"Next": "Terminate_Cluster"
},
"Terminate_Cluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:terminateCluster.sync",
"Parameters": {
"ClusterId.$": "$.ClusterId"
},
"End": true
}
}
}
In this script (or) AWS Step Function's State Machine Language, While creating the cluster I've mentioned the StepConcurrencyLevel as 2 and added 2 spark jobs as Steps below the cluster.
When I ran this script in Step Function, I was able to orchestrate the cluster and steps to run 2 steps concurrently in a cluster without directly configuring it in AWS EMR console (or) through AWS CLI (or) even through BOTO3.
I just used the State Machine Language to execute the orchestration of running 2 steps concurrently in a single cluster under AWS Step Function without any help from other services like lambda or livy API or BOTO3 etc...
This is how the Flow Diagram Looks:
AWS Step Function Workflow for concurrent step execution
To be more accurate on where I inserted the StepConcurrencyLevel in the above State Machine Language is here:
"Create_A_Cluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "WorkflowCluster",
"StepConcurrencyLevel": 2,
"Tags": [
{
"Key": "Description",
"Value": "process"
},
Under Create_A_Cluster.
Thank You.

How to pass Step Function input to Batch Job

What's the proper way to send part of a Step Function's input to a Batch Job?
I've tried setting and env var using Parameters.ContainerOverrides.Environment like this:
"Parameters": {
"ContainerOverrides": {
"Environment": [
{
"Name": "PARAM_1",
"Value": "$.param_1"
}
Step function input looks like this:
{
"param_1": "value-goes-here"
}
But the batch job just ends up getting invoked with literal "$.param_1" in the PARAM_1 env var.
Fixed. The Value key simply needed the ".$" postfix.
"Parameters": {
"ContainerOverrides": {
"Environment": [
{
"Name": "PARAM_1",
"Value.$": "$.param_1"
}
Pass it in "Parameters" (within the parent "Parameters"). Please note all parameters values are strings
"MyStepTask": {
"Type": "Task",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"JobDefinition": "myjobdef",
"JobName": "myjobname",
"JobQueue": "myjobqueue",
"Parameters": { "p_param1":"101",
"p_param2":"201"
}
},
"Next": "MyNextStepTask"
}
If you're wanting to pass parameters to Batch add the Parameters section to the parent Parameters section (not great naming!)
"MyStepTask": {
"Type": "Task",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"JobDefinition": "myjobdef",
"JobName": "myjobname",
"JobQueue": "myjobqueue",
"Parameters": {
"Name": "PARAM_1",
"Value.$": "$.param_1"
}
},
"Next": "MyNextStepTask"
}