Invoke a AWS Step functions by API Gateway and wait for the execution results

Invoke a AWS Step functions by API Gateway and wait for the execution results - amazon-web-services

Is it possible to invoke a AWS Step function by API Gateway endpoint and listen for the response (Until the workflow completes and return the results from end step)?
Currently I was able to find from the documentation that step functions are asynchronous by nature and has a final callback at the end. I have the need for the API invocation response getting the end results from step function flow without polling.

I guess that's not possible.
It's async and also there's the API Gateway Timeout
You don't need get the results by polling, you can combine Lambda, Step Functions, SNS and Websockets to get your results real time.
If you want to push a notification to a client (web browser) and you don't want to manage your own infra structure (scaling socket servers and etc) you could use AWS IOT. This tutorial may help you to get started:
http://gettechtalent.com/blog/tutorial-real-time-frontend-updates-with-react-serverless-and-websockets-on-aws-iot.html
If you only need to send the result to a backend (a web service endpoint for example), SNS should be fine.

This will probably work: create an HTTP "gateway" server that dispatches requests to your Steps workflow, then holds onto the request object until it receives a notification that allows it to send a response.
The gateway server will need to add a correlation ID to the payload, and the step workflow will need to carry that through.
One plausible way to receive the notification is with SQS.
Some psuedocode that's vaguely Node/Express flavoured:
const cache = new Cache(); // pick your favourite cache library
const gatewayId = guid(); // this lets us scale horizontally
const subscription = subscribeToQueue({
filter: { gatewayId },
topic: topicName,
});
httpServer.post( (req, res) => {
const correlationId = guid();
cache.add(correlationId, res);
submitToStepWorkflow(gatewayId, correlationId, req);
});
subscription.onNewMessage( message => {
const req = cache.pop(message.attributes.correlationId);
req.send(extractResponse(message));
req.end();
});
(The hypothetical queue reading API here is completely unlike aws-sdk's SQS API, but you get the idea)
So at the end of your step workflow, you just need to publish a message to SQS (perhaps via SNS) ensuring that the correlationId and gatewayId are preserved.
To handle failure, and avoid the cache filling with orphaned request objects, you'd probably want to set an expiry time on the cache, and handle expiry events:
cache.onExpiry( (key, req) => {
req.status(502);
req.send(gatewayTimeoutMessage());
req.end();
}
This whole approach only makes sense for workflows that you expect to normally complete in the sort of times that fit in a browser and proxy timeouts, of course.

Related

How should I handle asynchronous processes that occur after API calls in AWS?

I'm designing the backend for a website that uses API Gateway and Lambda to handle API requests, many of which target a MySQL DB on RDS. Some processes need to happen asynchronously but I'm debating which is best practice or cleaner.
In the given scenario, every time a user creates a new row in a certain table, let's say an email also needs to be sent asynchronously. There are many other scenarios similar to this but this will set precedent.
Option 1: In the lambda that handles the API request, first write to the MySQL instance to add the new row. When the response from MySQL comes back successful, write to something like SQS which will later be read from another lambda that sends an email. When the response from SQS is successful that the record was added to the queue, send a 201 response saying the REST API call was successful.
Option 2: In the lambda that handles the API request, write to the MySQL instance to add the new row. When the response from the MySQL comes back successful, send a 201 response saying the REST API call was successful. Then set up a DMS (data migration service) task that runs indefinitely to send database modification binlogs to a kinesis stream which will trigger a lambda that will handle all DB changes, read the change as a new row in a certain table, and send an email.
Option 1:
less infrastructure
more direct tracking of logic from an API call
1 extra http call (to sqs) delaying response times for an api for a web page
Option 2:
more infrastructure (dms task, replication instance)
scaling out shards may mean loss of ordering when processes binlog events if ordering is a requirement (it is)
side question: Are you able to choose hash key for kinesis for dms tasks from mysql?
a single codebase for reacting to all modifications in the DB may actually make following logic in code simpler
Is this the tradeoff or am I missing something? What is best practice in this scenario?

Option 1 in my view seems most logical, but I would replace SQS and second lambda with SNS. So, modified option 1 could be:
Option 1: In the lambda that handles the API request, first write to the MySQL instance to add the new row. When the response from MySQL comes back successful, publish confirmation message to SNS that sends an email. When the response from SNS is successful send a 201 response saying the REST API call was successful.
This should be faster, cheaper and easier to implement then using SQS and second lambda for sending email.

I have a long running lambda function asynchronously invoked by API gateway. How do I return output via same API gateway once execution completes?

My function takes about 1-2 min to execute while API gateway has 30 second timeout. To overcome this I followed AWS documentation and enabled InvocationType:Event header. The issue is that I receive 200 response after execution, but how do I receive my output? I can see lambda output in cloudwatch but what happens to it next?
Code used for making request
var xhttp = new XMLHttpRequest();
xhttp.open("POST", "https://my-endpoint.com", true);
xhttp.setRequestHeader("Content-type", "application/json");
xhttp.setRequestHeader("InvocationType", "Event");
xhttp.send("foo");
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200) {
console.log(this.responseText); //no output here
}
};
If I send a synchronous request by removing InvocationType header then I get desired result.

When dealing to asynchronous functions, you have two ways out:
Use a callback url: on your first request, you pass as parameter (or previously configure) a callback url that your function will call when have the result. This is the most efficient way and widely used on many APIs. It's also called web hook
Store your lambda function response anywhere and your api client must pool for results. This is not so efficient because you will need to do multiple requests until the lambda function finishes, increasing your costs.
The flow on the first method is like:
Hey API, do my long duration task, and when ready call me back on https://sample-domain.com/callback-url
API processes the task, and access the given callback url with the result as payload
The client receives the callback request and processes the result as desired.
There's some security concerns here: someone can discover your callback url and try doing requests to fake something or just trying to DDoS attack.
To mitigate this possibility, you can use random urls for the callback, adding something that identifies the original request.

How to subscribe AWS Lambda to Salesforce Platform Events

We want to integrate Salesforce into out Micro Service Structure in AWS.
There is a article about this here
So we want to subscribe lambda to certain platform events in salesforce.
But i found no code examples for this. I gave it a try using node.js (without lambda). This works great:
var jsforce = require('jsforce');
var username = 'xxxxxxxx';
var password = 'xxxxxxxxxxx';
var conn = new jsforce.Connection({loginUrl : 'https://test.salesforce.com'});
conn.login(username, password, function(err, userInfo) {
if (err) { return console.error(err); }
console.error('Connected '+userInfo);
conn.streaming.topic("/event/Contact_Change__e").subscribe(function(message) {
console.dir(message);
});
});
But i am not sure if this is the right way to do it in lambda.

My understanding of Salesforce Platform Events is that they use CometD under the hood. CometD allows the HTTP client (your code) to subscribe to events published by the HTTP server.
This means your client code needs to be running and be in a state where it is subscribed and listening for server events for the duration of time that you expect to be receiving events. In most cases, this duration is indefinate i.e. your client code expects to wait forever in a subscribed state, ready to receive events.
This is at odds with AWS Lambda functions, which are expected to complete execution in a relatively short amount of time (max 15 minutes last time I checked).
I would suggest you need a long running process, such as a nodejs application running in Elastic Beanstalk, or in a container. The nodejs application can stay running indefinately, in a subscribed state. Each time it receives an event, it could call your AWS Lambda function in order to implement the required actions.

Asynchronous HTTP request in AWS Lambda

I am wanting to execute a http request inside of a lambda function, invoked by API Gateway. The problem is, the request takes a bit of time to complete (<20 seconds) and don't want the client waiting for a response. In my research on asynchronous requests, I learned that I can pass the X-Amz-Invocation-Type:Event header to make the request execute asynchronously, however this isn't working and the code still "waits" for the http request to complete.
Below is my lambda code:
'use strict';
const https = require('https');
exports.handler = function (event, context, callback) {
let requestUrl;
requestUrl = event.queryStringParameters.url;
https.get(requestUrl, (res) => {
console.log('statusCode:', res.statusCode);
console.log('headers:', res.headers);
res.on('data', (d) => {
process.stdout.write(d);
});
}).on('error', (e) => {
console.error(e);
});
let response = {
"statusCode": 200,
"body": JSON.stringify(event.queryStringParameters)
};
callback(null, response);
};
Any help would be appreciated.

You can use two Lambda functions.
Lambda 1 is triggered by API Gateway then calls Lambda 2 asynchronously (InvocationType = Event) then returns a response to the user.
Lambda 2, once invoked, will trigger the HTTP request.

Whatever you do, don't use two lambda functions.
You can't control how lambda is being called, async or sync. The caller of the lambda decides that. For APIGW, it has decided to call lambda sync.
The possible solutions are one of:
SQS
Step Functions (SF)
SNS
In your API, you call out to one of these services, get back a success, and then immediately return a 202 to your caller.
If you have a high volume of single or double action execution use SQS. If you have potentially long running with complex state logic use SF. If you for someone reason want to ignore my suggestions, use SNS.
Each of these can (and should) call back out to a lambda. In the case that you need to run more than 15 minutes, they can call back out to CodeBuild. Ignore the name of the service, it's just a lambda that supports up to 8 hour runs.
Now, why not use two lambdas (L1, L2)? The answer is simple. Once you respond that your async call was queued (SQS, SF, SNS), to your users (202). They'll expect that it works 100%. But what happens if that L2 lambda fails. It won't retry, it won't continue, and you may never know about it.
That L2 lambda's handler no longer exist, so you don't know the state any more. Further, you could try to add logging to L2 with wrapper try/catch but so many other types of failures could happen. Even if you have that, is CloudWatch down, will you get the log? Possible not, it just isn't a reliable strategy. Sure if you are doing something you don't care about, you can build this arch, but this isn't how real production solutions are built. You want a reliable process. You want to trust that the baton was successfully passed to another service which take care of completing the user's transaction. This is why you want to use one of the three services: SQS, SF, SNS.

Api gateway get output results from step function?

I followed tutorial on creating and invoking step functions
I'm getting output in my GET request of api as
{
"executionArn": "arn:aws:states:ap-northeast-1:123456789012:execution:HelloWorld:MyExecution",
"startDate": 1.486772644911E9
}
But, instead of above response I want my step functions output, which is given by end state as below.
{
"name":"Hellow World"
}
How to achieve this?

Update: You can now use Express Step Functions for synchronous requests.
AWS Step Functions are asynchronous and do not immediately return their results. API Gateway methods are synchronous and have a maximum timeout of 29 seconds.
To get the function output from a Step Function, you have to add a second method in API Gateway which will call the Step Function with the DescribeExecution action. The API Gateway client will have to call this periodically (poll) until the returned status is no longer "RUNNING".
Here's the DescribeExecution documentation

Use Express Step Functions instead. This type of Step Functions can be called synchronously. Go to your API Gateway and in the Integration Request section make sure you have the StartSyncExecution action:
After that, go a bit lower in the same page to the Mapping Templates:
and include the following template for the application/json Content-Type:
#set($input = $input.json('$'))
{
"input": "$util.escapeJavaScript($input)",
"stateMachineArn": "arn:aws:states:us-east-1:your_aws_account_id:stateMachine:your_step_machine_name"
}
After that, go back to the Method Execution and go to the Integration Response and then to the Mapping Templates section:
And use the following template to have a custom response from your lambda:
#set ($parsedPayload = $util.parseJson($input.json('$.output')))
$parsedPayload
My testing Step Function is like this:
And my Lambda Function code is:
Deploy your API Gateway stage.
Now, if you go to Postman and send a POST request with any json body, now you have a response like this:

New Synchronous Express Workflows for AWS Step Functions is the answer:
https://aws.amazon.com/blogs/compute/new-synchronous-express-workflows-for-aws-step-functions/
Amazon API Gateway now supports integration with Step Functions StartSyncExecution for HTTP APIs:
https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-api-gateway-supports-integration-with-step-functions-startsyncexecution-http-apis/

First of all the step functions executes asynchronously and API Gateway is only capable of invoking the step function (Starting a flow) only.
If you are waiting for the results of a step function invocation from a web application, you can use AWS IOT WebSockets for this. The steps are as follows.
Setup AWS IOT topic with WebSockets.
Configure the API Gateway and Step functions invocation.
From the Web Frontend subscribe to the IOT Topic as a WebSocket listener.
At the last step (And in error steps) in the Step Functions workflow use AWS SDK to trigger the IOT Topic which will broadcast the results to the Web App running in the browser using WebSockets.
For more details on WebSockets with AWS IOT refer the medium article Receiving AWS IoT messages in your browser using websockets.

Expanding on what #MikeD at AWS says, if you're certain that the Step Function won't exceed the 30 second timeout, you could create a lambda that executes the step function and then blocks as it polls for the result. Once it has the result, it can return it.
It is a better idea to have the first call return immediately with the execution id, and then pass that id into a second call to retrieve the result, once it's finished.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js