AWS Express Step Function execution within Express Step Function

AWS Express Step Function execution within Express Step Function - amazon-web-services

In the Standard Workflow we can happily invoke another Standard workflow using
{
"Type": "Task",
"Resource": "arn:aws:states:::states:startExecution.sync:2",
"Parameters": {
"StateMachineArn": "${NestedStateMachineArn}",
...
}
...
When we try to do the same with Express workflow we of course get the Express state machine does not support '.sync' service integration. That is stated by aws so expected behaviour.
Is there another way to execute Express workflow from another Express workflow and somehow get the execution result/output? I can think of a last resort - use Lambda function to execute the nested workflow sync and wait for a response, that said, it will increase the cost having a function waiting for StateMachine needlessly.
I tried to look around but couldn't find this documented anywhere.

You can execute Express executions synchronously using the StartSyncExecution API, which is now supported as an AWS SDK integration in Step Functions using "Resource": "arn:aws:states:::aws-sdk:sfn:startSyncExecution".
"NestedExpressWorkflow": {
"Type": "Task",
"Resource": "arn:aws:states:::aws-sdk:sfn:startSyncExecution",
"Parameters": {
"StateMachineArn": <your_express_state_machine>,
"Input.$": "$"
},
"Next": "NextState"
}

You can execute another workflow, you just can't wait for the results. I believe you just need to remove .sync from the resource. If you are needing to wait for the results of the second function you won't be able to do that within an express workflow.
From Service Integrations with AWS Step Functions
Standard Workflows and Express Workflows support the same set of service integrations but do not support the same integration patterns. Express Workflows do not support Run a Job (.sync) or Wait for Callback (.waitForTaskToken). For more information, see Standard vs. Express Workflows.

Related

how to add sharedIdentifier to aws event bridge rule for scheduled execution of aws batch job

I configured aws bridge event rule (via web gui) for running aws batch job - rule is triggered but a I am getting following error after invocation:
shareIdentifier must be specified. (Service: AWSBatch; Status Code: 400; Error Code: ClientException; Request ID: 07da124b-bf1d-4103-892c-2af2af4e5496; Proxy: null)
My job is using scheduling policy and needs shareIdentifier to be set but I don`t know how to set it. Here is screenshot from configuration of rule:
There are no additional settings for subsequent arguments/parameters of job, the only thing I can configure is retries. I also checked aws-cli command for putting rule (https://awscli.amazonaws.com/v2/documentation/api/latest/reference/events/put-rule.html) but it doesn`t seem to have any additional settings. Any suggestions how to solve it? Or working examples?
Edited:
I ended up using java sdk for aws batch: https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-batch. I have a scheduled method that periodically spawns jobs with following peace of code:
AWSBatch client = AWSBatchClientBuilder.standard().withRegion("eu-central-1").build();
SubmitJobRequest request = new SubmitJobRequest()
.withJobName("example-test-job-java-sdk")
.withJobQueue("job-queue")
.withShareIdentifier("default")
.withJobDefinition("job-type");
SubmitJobResult response = client.submitJob(request);
log.info("job spawn response: {}", response);

Have you tried to provide additional settings to your target via the input transformer as referenced in the AWS docs AWS Batch Jobs as EventBridge Targets ?
FWIW I'm running into the same problem.

I had a similar issue, from the CLI and the GUI, I just couldn't find a way to pass ShareIdentifier from an Eventbridge rule. In the end I had to use a state machine (step function) instead:
"States": {
"Batch SubmitJob": {
"Type": "Task",
"Resource": "arn:aws:states:::batch:submitJob.sync",
"Parameters": {
"JobName": <name>,
"JobDefinition": <Arn>,
"JobQueue": <QueueName>,
"ShareIdentifier": <Share>
},
...
You can see it could handle ShareIdentifier fine.

Sns mail notification when a step is not kicked off within a threshold timeframe

I have an emr step which is submitted through step function. During step run I can see task is submitted, but emr step is not executed and emr console don’t have any information .
How can I debug this?
How can I send an sns when a step doesn’t start execution with in a threshold timeframe?in my case step function shows emr task submitted but no information on emr console and pipeline is long running without failing for more than half hr

You could start the debugging process through the Step Functions execution log and identify the specific step that has failed, and later, you can move on looking for the EMR console or the specific service that has failed. Usually when the EMR step doesn't appear in the EMR console, is due to a Runtime Error, caused by an exception raised when calling the EMR step.
For this scenario, you can use the Error Handling that Step Functions has, using the Catch and Timeout fields, you can find more details in the AWS documentation here.
Basically you need to add this fields as show bellow:
{
"StartAt": "EmrStep",
"States": {
"EmrStep": {
"Type": "Task",
"Resource": "arn:aws:emr:execute-X-step",
"Comment": "This is your EMR step",
"TimeoutSeconds": 10,
"Catch": [ {
"ErrorEquals": ["States.Timeout"],
"Next": "ShutdownClusterAndSendSNS"
} ],
"End": true
},
"ShutdownClusterAndSendSNS": {
"Type": "Pass",
"Comment": "This step handles the timeout exception raised",
"Result": "You can shutdown the EMR cluster to avoid increased cost here and later send a sns notification!",
"End": true
}
}
Note: To catch the timeout exception, you have to catch the error States.Timeout, but also you can define the same catch field for other types of error.

Recursive AWS Lambda function calls - Best Practice

I've been tasked to look at a service built on AWS Lambda that performs a long-running task of turning VMs on and off. Mind you, I come from the Azure team so I am not familair with the styling or best practices of AWS services.
The approach the original developer has taken is to send the entire workload to one Lambda function and then have that function take a section of the workload and then recursively call itself with the remaining workload until all items are gone (workload = 0).
Pseudo-ish Code:
// Assume this gets sent to a HTTP Lambda endpoint as a whole
let workload = [1, 2, 3, 4, 5, 6, 7, 8]
// The Lambda HTTP endpoint
function Lambda(workload) {
if (!workload.length) {
return "No more work!"
}
const toDo = workload.splice(0, 2) // get first two items
doWork(toDo)
// Then... except it builds a new HTTP request with aws sdk
Lambda(workload) // 3, 4, 5, 6, 7, 8, etc.
}
This seems highly inefficient and unreliable (correct me if I am wrong). There is a lot of state being stored in this process and in my opinion that creates a lot of failure points.
My plan is to suggest we re-engineer the entire service to use a Queue/Worker type framework instead, where ideally the endpoint would handle one workload at a time, and be stateless.
The queue would be populated by a service (Jenkins? Lambda? Manually?), then a second service would read from the queue (and ideally scale-out as well, as needed).

UPDATE: AWS EventBridge now looks like the preferred solution.
It's "Coupling" that I was thinking of, see here: https://www.jeffersonfrank.com/insights/aws-lambda-design-considerations
Coupling
Coupling goes beyond Lambda design considerations—it’s more about the system as a whole. Lambdas within a microservice are sometimes tightly coupled, but this is nothing to worry about as long as the data passed between Lambdas within their little black box of a microservice is not over-pure HTTP and isn’t synchronous.
Lambdas shouldn’t be directly coupled to one another in a Request Response fashion, but asynchronously. Consider the scenario when an S3 Event invokes a Lambda function, then that Lambda also needs to call another Lambda within that same microservice and so on.
aws lambda coupling
You might be tempted to implement direct coupling, like allowing Lambda 1 to use the AWS SDK to call Lambda 2 and so on. This introduces some of the following problems:
If Lambda 1 is invoking Lambda 2 synchronously, it needs to wait for the latter to be done first. Lambda 1 might not know that Lambda 2 also called Lambda 3 synchronously, and Lambda 1 may now need to wait for both Lambda 2 and 3 to finish successfully. Lambda 1 might timeout as it needs to wait for all the Lambdas to complete first, and you’re also paying for each Lambda while they wait.
What if Lambda 3 has a concurrency limit set and is also called by another service? The call between Lambda 2 and 3 will fail until it has concurrency again. The error can be returned to all the way back to Lambda 1 but what does Lambda 1 then do with the error? It has to store that the S3 event was unsuccessful and that it needs to replay it.
This process can be redesigned to be event-driven: lambda coupling
Not only is this the solution to all the problems introduced by the direct coupling method, but it also provides a method of replaying the DLQ if an error occurred for each Lambda. No message will be lost or need to be stored externally, and the demand is decoupled from the processing.

AWS Step Functions is one way you can achieve this. Step Functions are used to orchestrate multiple Lambda functions in any manner you want - parallel executions, sequential executions or a mix of both. You can also put wait steps, condition checks, retries in between if you need.
Your overall step function might look something like this (say you want 1,2,3 to execute in parallel. Then when all these are complete, you want to execute 4, and then again 5 and 6 in parallel)
Configuring this is also pretty simple. It accepts a JSON like the following
{
"Comment": "An example of the Amazon States Language using a parallel state to execute two branches at the same time.",
"StartAt": "Parallel",
"States": {
"Parallel": {
"Type": "Parallel",
"Next": "Task4",
"Branches": [
{
"StartAt": "Task1",
"States": {
"Task1": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
"End": true
}
}
},
{
"StartAt": "Task2",
"States": {
"Task2": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
"End": true
}
}
},
{
"StartAt": "Task3",
"States": {
"Task3": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
"End": true
}
}
}
]
},
"Task4": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
"Next": "Parallel2"
},
"Parallel2": {
"Type": "Parallel",
"Next": "Final State",
"Branches": [
{
"StartAt": "Task5",
"States": {
"Task5": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
"End": true
}
}
},
{
"StartAt": "Task6",
"States": {
"Task6": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-south-1:XXX:function:XXX",
"End": true
}
}
}
]
},
"Final State": {
"Type": "Pass",
"End": true
}
}
}

Cognito User Migration Trigger - Exception during user migration - Exception Location

We're using a lambda function to respond to the 'User Migration' trigger in AWS Cognito. When something like a syntax error occurs, you can see it in cloud watch logs. However, "Exception during user migration" errors seen on the login page are no where to be found in the cloud watch logs.
Where are we supposed to look for these? I can't find any anything in the documentation and assumed it would have gone to cloud watch.
I can't test it in the lambda interface because one of the parameters being passed into the lambda function will have a function nested within the object and I can't create a test JSON setup that has that. There's also no test trigger for user migration that is pre-built.
Any ideas as to why I can't see this in cloud watch or where the exceptions would be shown would be greatly appreciated.

Unfortunately Cogntio doesn't expose any logs (or metrics, for that matter!).
The closest you can get is to view the lambda's logs in CloudWatch. If you log your response, and watch your lambda's error metric then you should mostly be able to debug issues internal to the lambda.
This does leave a few edge cases:
You won't see anything if the lambda can't be invoked (this would only happen under heavy concurrent loads either on that single lambda, or on all lambdas across your account)
If you return a bad response the lambda will succeed but the trigger action will fail and Cognito will give you a fairly generic message. At this point you're at the mercy of AWS' documentation to work out what's wrong (which can be a bit hit and miss- although StackOverflow always helps!).
You can find an example payload for the lambda in the trigger documentation:
{
"userName": "THE USERNAME",
"request": {
"password": "THE PASSWORD"
},
"response": {
// it is your responsibility to fill this bit in and return the completed object back:
"userAttributes": {
"string": "string",
...
},
"finalUserStatus": "string",
"messageAction": "string",
"desiredDeliveryMediums": [ "string", ... ],
"forceAliasCreation": boolean
}
}
n.b. As an aside, which you might know, but Lambda payloads always have to be in JSON, which does not store functions. So you should always be able to derive a test payload to use in the console.

Lambda function not working upon Alexa Skill invocation

I've just created my first (custom) still. I've set the function up in Lambda by uploading a zip file containing my index.js and all the necessary code required, including node_modules and the base Alexa skill that mine is a child of (as per the tutorials). I made sure I zipped up the files and sub-folders, not the folder itself (as I can see this is a common cause of similar errors) but when I create the skill and test in the web harness with a sample utterance I get:
remote endpoint could not be called, or the response it returned was
invalid.
I'm not sure how to debug this as there's nothing logged in CloudWatch.
I can see in the Lambda request that my slot value is translated/parsed successfully and the intentname is correct.
In AWS Lambda I can invoke the function successfully both with a LaunchRequest and another named intent. From the developer console though, I get nothing.
I've tried copying the JSON from the lambda test (that works) to the developer portal and I get the same error. Here is a sample of the JSON I'm putting in the dev portal (that works in Lambda)
{
"session": {
"new": true,
"sessionId": "session1234",
"attributes": {},
"user": {
"userId": null
},
"application": {
"applicationId": "amzn1.echo-sdk-ams.app.149e75a3-9a64-4224-8bcq-30666e8fd464"
}
},
"version": "1.0",
"request": {
"type": "LaunchRequest",
"requestId": "request5678"
}
}

The first step in pursuing this problem is probably to test your lambda separate from your skill configuration.
When looking at your lambda function in the AWS console, note the 'test' button at the top, and next to it there is a drop down with an option to configure a test event. If you select that option you will find that there are preset test events for Alexa. Choose 'alexa start session' and then choose 'save and test' button.
This will give you more detailed feedback about the execution of your lambda.
If your lambda works fine here then the problem probably lies in your skill configuration, so I would go back through whatever tutorial and documentation you were using to configuration your skill and make sure you did it right.
When you write that the lambda request looks fine I assume you are talking about the service simulator, so that's a good start, but there could still be a problem on the configuration tab.

We built a tool for local skill development and testing.
BST Tools
Requests and responses from Alexa will be sent directly to your local server, so that you can quickly code and debug without having to do any deployments. I have found this to be very useful for our own development.
Let me know if you have any questions.
It's open source: https://github.com/bespoken/bst

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js