Serverless-ly Query External REST API from AWS and Store Results in S3? - amazon-web-services

Given a REST API, outside of my AWS environment, which can be queried for json data:
https://someExternalApi.com/?date=20190814
How can I setup a serverless job in AWS to hit the external endpoint on a periodic basis and store the results in S3?
I know that I can instantiate an EC2 instance and just setup a cron. But I am looking for a serverless solution, which seems to be more idiomatic.
Thank you in advance for your consideration and response.

Yes, you absolutely can do this, and probably in several different ways!
The pieces I would use would be:
CloudWatch Event using a cron-like schedule, which then triggers...
A lambda function (with the right IAM permissions) that calls the API using eg python requests or equivalent http library and then uses the AWS SDK to write the results to an S3 bucket of your choice:
An S3 bucket ready to receive!
This should be all you need to achieve what you want.

I'm going to skip the implementation details, as it is largely outside the scope of your question. As such, I'm going to assume your function already is written and targets nodeJS.
AWS can do this on its own, but to make it simpler, I'd recommend using Serverless. We're going to assume you're using this.
Assuming you're entirely new to serverless, the first thing you'll need to do is to create a handler:
serverless create --template "aws-nodejs" --path my-service
This creates a service based on the aws-nodejs template on the provided path. In there, you will find serverless.yml (the configuration for your function) and handler.js (the code itself).
Assuming your function is exported as crawlSomeExternalApi on the handler export (module.exports.crawlSomeExternalApi = () => {...}), the functions entry on your serverless file would look like this if you wanted to invoke it every 3 hours:
functions:
crawl:
handler: handler.crawlSomeExternalApi
events:
- schedule: rate(3 hours)
That's it! All you need now is to deploy it through serverless deploy -v
Below the hood, what this does is create a CloudWatch schedule entry on your function. An example of it can be found over on the documentation

First thing you need is a Lambda function. Implement your logic, of hitting the API and writing data to S3 or whatever, inside the Lambda function. Next thing, you need a schedule to periodically trigger your lambda function. Schedule expression can be used to trigger an event periodically either using a cron expression or a rate expression. The lambda function you created earlier should be configured as the target for this CloudWatch rule.
The resulting flow will be, CloudWatch invokes the lambda function whenever there's a trigger (depending on your CloudWatch rule). Lambda then performs your logic.

Related

What service in AWS has an api to schedule calling a REST api?

GCP has a service called "GCP Cloud Scheduler". I can simply call an api to schedule a REST endpoint call in 45 minutes OR can call the api to schedule a recurring call every 24 hours.
What is the AWS equivalent here? I see a bunch of stuff with lambdas but don't really want the added complexity of needing a lamba (ie. GCP functions == lambdas and I don't need a GCP function to do what I need in GCP). A lambda would be 1 more point of failure I do not want to really monitor.
TWO main questions
is there an equivalent (preferably without lambdas)?
if lambdas is the only way to go, what is the service to call to make sure I can feed the REST endpoint through to the lambda to call? (I am really hoping I don't have to create a lambda PER job as that is even more work).
I am considering just using GCP's service and having it call our AWS endpoints as that may be a ton easier unless anyone knows of an AWS equivalent?
I have not tried anything yet as I can't quite find the correct API in AWS.
Create an EventBridge Rule with an API Destination
Unfortunately, there is no other equivalent in AWS if your endpoint cannot respond within 5s.
As you mentioned, you'd need a lambda to call your endpoint. This lambda can be triggered at a regular interval using EventBridge. When creating the rule, you can specify a custom input (that could be your endpoint).

Push data from external API on AWS Kinesis

I am new to AWS ecosystem. I'm building a (near) real-time system, where data comes from external API. The API is updated every 10 seconds, so I would like to consume and populate my Kinesis pipeline as soon as new data appears.
However, I'm not sure which tool use for that. I did a small research and, I think, I have two options:
AWS lambda which is triggered every 10 seconds and puts data on Kinesis
AWS StepFunction
What is the standard approach for a given use case?
AWS Step functions is created by Lambda functions. That is, each step in a workflow is actually a Lambda function. You can think of a workflow created by AWS Step Functions as a chain of Lambda functions.
If you are not familiar with how to create a workflow see this AWS tutorial:
Create AWS serverless workflows by using the AWS SDK for Java
(you can create a Lambda function in any supported programming language. This one happens to use Java).
Now, to answer your question, using a workflow to populate a Kinesis data stream is possible. You can build a Lambda function that gathers data (using logic in your Lambda function), and then invoke the putRecord operation of Kinesis to populate the data stream. You can create a scheduled event that fires off every x min based on a CRON expression.
If you do use a CRON expression, you can use the AWS Step Functions API to fire off the workflow. That is, create another Lambda function that is scheduled to fire say every 10 mins. Then in this Lambda funciton, use the Step Functions API to invoke the workflow. Now the workflow can populate the Kinesis data stream with data.

Using AWS API in order to invoke Lambda functions Asynchronously

I have been researching AWS Documentation on how to invoke lambda functions, and I've come across different ways to do that. Mainly, Lambda invocation is done by calling Invoke() function which can be used to invoke lambda functions synchronously or asynchronously.
Currently I am invoking my Lambda functions via HTTP Request (as REST API), but, HTTP Request times out after 30 seconds, while asynchronous calls as far as I know times out after 15min.
What are the advantages, besides time that I have already mentioned, of asynchronous lambda invocation compared to invoking lambda with HTTP Request. Also, what are best (recommended) ways to invoke lambdas in production? On AWS docs (SDK for Go - https://docs.aws.amazon.com/sdk-for-go/api/service/lambda/#InvokeAsyncInput) I see that InvokeAsyncInput and InvokeAsyncOutput have been depricated. So I am wondering how async implementation would actually look like.
Lambda really is about event-driven-computing. This means Lambda always gets triggered in response to an event. This event can originate from a wide range of AWS Services as well as the AWS CLI and SDK.
All of these events invoke the Lambda function and pass some kind of information in the form of an event and context object. How this event looks like depends on the service that triggered lambda. You can find more information about the context in this documentation.
There is no real "best" way to invoke Lambda - this mostly depends on your use case - if you're building a webservice, let API Gateway invoke Lambda for you. If you want to process new files on S3 - let S3 trigger Lambda. If you're just testing the Lambda function you can invoke it via the CLI. If you have custom software that needs to trigger a Lambda function you can use the SDK. If you want to run Lambda on a schedule, configure CloudWatch events...
Please provide more information about your use case if you require a more detailed evaluation of the available options - right now this is very broad.

AWS SQS trigger Step Functions

Quick question: Is it possible to trigger the execution of a Step Function after an SQS message was sent?, if so, how would you specify it into the cloudformation yaml file?
Thanks in advance.
The first think to consider is this: do you really need to use SQS to start a Step Functions state machine? Can you use API gateway instead? Or could you write your messages to a S3 bucket and use the CloudWatch events to start a state machine?
If you must use SQS, then you will need to have a lambda function to act as a proxy. You will need to set up the queue as a lambda trigger, and you will need to write a lambda that can parse the SQS message and make the appropriate call to the Step Functions StartExecution API.
I’m on mobile, so I can’t type up the yaml right now, but if you need it, I can try to update with it later. For now, here is detailed walkthrough of how to invoke a Step Functions state machine from Lambda (including example yaml), and here is walkthrough of how to use CloudFormation to set up SQS to trigger a Lambda.
EventBridge Pipes (launched at re:Invent 2022) allows you to trigger Step Functions State Machines without need for a Lambda function.
https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-pipes.html
You can find an example here:
https://github.com/aws-samples/aws-stepfunctions-examples/blob/main/sam/demo-trigger-stepfunctions-from-sqs/template.yaml

AWS application consistent snapshots of EC2 instances

I'm currently setting up a small Lambda to take snapshots of all the important volumes of our EC2 instances. To guarantee application consistency I need to trigger actions inside the instances: One to quiesce the application before the snapshot and one to wake it up again after the snapshot is done. So far I have no clue how to do this.
I've thought about using SNS or SQS to notify the instances about start and stop of the snapshot, but that has several problems:
I'll need to install (and develop) a custom listener inside the instances.
I'll not get feedback if the quiescing/wake-up is done.
So here's my question: How can I trigger an action inside an instance from an Lambda?
But maybe I'm approaching this from the wrong direction. Is there really no simple backup solution? I know azure has a snapshot based backup service that can do application consitent backups. Did I just miss an equivalent AWS service?
Edit 1:
Ok, it looks like the feature 'Run Command' of AWS Systems Manager is what I really need. It allows me to run scripts, Ansible playbooks and more inside an EC2 instance. When I've got a working solution I'll post the necessary steps.
You can trigger a Lambda function on demand:
Using AWS Lambda with Amazon API Gateway (On-Demand Over HTTPS)
You can invoke AWS Lambda functions over HTTPS. You can do this by
defining a custom REST API and endpoint using Amazon API Gateway, and
then mapping individual methods, such as GET and PUT, to specific
Lambda functions. Alternatively, you could add a special method named
ANY to map all supported methods (GET, POST, PATCH, DELETE) to your
Lambda function. When you send an HTTPS request to the API endpoint,
the Amazon API Gateway service invokes the corresponding Lambda
function. For more information about the ANY method, see Step 3:
Create a Simple Microservice using Lambda and API Gateway.