Design a high available API service with database backend on AWS - amazon-web-services

I am working on an API layer which serves requests from a backend database.
So the requirements are:
Repopulate the whole table without downtime for API service: A main requirement for the API is that we should be able to re-populate the tables( 2to 3 tables, structured csv like data) in backend database periodically (bi-weekly or monthly), but the API service should not go down.
low latency globally in the order of 100s of millisecond
scalability with requests per second
rate limit clients
also switch to previous versions of the tables in the backend in case of issues.
My questions are about what kinds of AWS database that i can use and other AWS components which can achieve the above goals.

If you want a secure, low latency global API, then I would go with an edge optimized API Gateway API.
Documentation is here API GW limits regarding the maximum requests per second.
You can rate limit clients using API GW. Also, you can have different stages in API GW that correspond to different aliases in lambda. Lambda would be your serverless compute layer that would handle your API GW requests and in turn query your database. Using versioning and aliasing in lambda would allow you to switch to different database tables. Given that you are planning to use csv like data, you could go with RDS and use the Aurora engine, which is compliant with MySQL and PostgreSQL, and is an extremely cost effective option.
As some additional information, you should use lambda proxy integration between your API GW APIs and your lambda functions. This allows you to enable Identity and Access Management (IAM) for your APIs.
Documentation on Lambda proxy integration: Lambda proxy integration
Here is some documentation on Lambda: AWS Lambda versioning and aliases
Here is some documentation on RDS Aurora: AWS RDS Aurora

Related

AWS HTTP API Gateway as a proxy to private S3 bucket

I have a private S3 bucket with lots of small files. I'd like to expose the contents of the bucket (only read-only access) using AWS API Gateway as a proxy. Both S3 bucket and AWS API Gateway belong to the same AWS account and are in the same VPC and Availability Zone.
AWS API Gateway comes in two types: HTTP API, REST API. The configuration options of REST API are more advanced, additionally, REST API supports much more AWS services integrations than the HTTP API. In fact, the use case I described above is fully covered in one of the documentation tabs of REST API. However, REST API has one huge disadvantage - it's about 70% more expensive than the HTTP API, the price comes with more configuration options but as for now, I need only one - integration with the S3 service that's why I believe this type of service is not well suited for my use case. I started searching if HTTP API can be integrated with S3, and so far I haven't found any way to achieve it.
I tried creating/editing service-linked roles associated with the HTTP API Gateway instance, but those roles can't be edited (only read-only access). As for now, I don't have any idea where I should search next, or if my goal is even achievable using HTTP API.
I am a fan of AWSs HTTP APIs.
I work daily with an API that serves a very similar purpose. The way I have done it is by using AWS Lambda functions integrated with the APIs paths.
What works for me is this:
Define your API paths, and integrate them with AWS Lambda functions.
Have your integrated Lambda function return a signed URL for any objects you want to provide access to through API calls.
There are several different ways to pass the name of the object(s) you want to the Lambda function servicing the API call.
This is the short answer. I plan to give a longer answer at a later time. But this has worked for me.

How to charge users by usage?

I’m building a service and I’m planning to charge a fixed price for each lambda call.
How to count requests per client if the lambda function being called is the same? I’m planning to pass a client id
You can use API Gateway Usage Plans for your requirement.
After you create, test, and deploy your APIs, you can use API Gateway usage plans to make them available as product offerings for your customers. You can configure usage plans and API keys to allow customers to access selected APIs at agreed-upon request rates and quotas that meet their business requirements and budget constraints. If desired, you can set default method-level throttling limits for an API or set throttling limits for individual API methods.
A usage plan specifies who can access one or more deployed API stages and methods—and also how much and how fast they can access them. The plan uses API keys to identify API clients and meters access to the associated API stages for each key. It also lets you configure throttling limits and quota limits that are enforced on individual client API keys.
Read this docs for more detail explanation.
You can use api gateway https://aws.amazon.com/api-gateway/
"Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. APIs act as the "front door" for applications to access data, business logic, or functionality from your backend services."
It provides you with statistics about usage as well as different options like limit numbers of requests per api_key, etc

Is it possible to remove API Gateway from the equation to serve Lambda over public internet?

Currently, my application resides in lambda which I serve using HTTP API (API Gateway V2). This setup exists in multiple regions. Meaning, API Gateway invokes lambda in the same region which accesses DynamoDB Global Table in the same region. I use Route 53 to serve nearest API Gateway to user.
The problem I faced: API Gateway doesn't support redirection from http to https. I can achieve this with CloudFront. But, it'll increase cost as well as latency.
Can I remove API Gateway from the equation and use Lambda#Edge to access DynamoDB Table near the user? Can CloudFront be used to replace API Gateway?
Yes you can. The docs write:
Functions triggered by origin request and response events as well as functions triggered by viewer request and response events can make network calls to resources on the internet, and to AWS services such as Amazon S3 buckets, DynamoDB tables, or Amazon EC2 instances.
However, there are many limitations to what lambda#edge can do, as compared to a regular lambda. Examples are:
only python and nodejs,
difficulty in debugging, as lambda logs will be in region when it runs, not in one central region,
timeout limits on calls to DynamoDb (5 or 30 seconds) depending if its origin or viewer function,
no lambda layers
max memory of 128 MB for viewer side functions
deployment package size can be max 1 MB for viewer side functions
Thus if you can work with these and other limitations of lambda#edge, then you can use it to work with DynamoDb.

Application Load Balancers vs API Gateway

AWS comes with a service called Application Load Balancer and it could be a trigger to a lambda function. The way to call such a lambda function is by sending an HTTP/HTTPS request to ALB.
Now my question is how this is any different from using the API Gateway? And when should one use ALB over API Gateway (or the way around)?
One of the biggest reasons we use API gateway in front of our lambda functions instead of using an ALB is the native IAM (Identity and Access Management) integration that API GW has. We don't have to do any of the identity work ourselves, it's all delegated to IAM, and in addition to that, API GW has built-in request validation including validation of query string parameters and headers. In a nutshell, there are so many out of the box integrations what come with API GW, you wind up having to do a lot more work if you go the route of using an ALB.
It seems that the request/response limit is lower when using ALB, and WebSockets are not supported:
The maximum size of the request body that you can send to a Lambda
function is 1 MB. For related size limits, see HTTP Header Limits.
The maximum size of the response JSON that the Lambda function can
send is 1 MB.
WebSockets are not supported. Upgrade requests are rejected with an
HTTP 400 code.
See: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/lambda-functions.html
Payload limit with API Gateway is discussed here: Request payload limit with AWS API Gateway
Also the article already mentioned by #matesio provides information about additional things to consider when choosing between ALB and API Gateway.
Notable tweet referenced in the mentioned article:
If you are building an API and want to leverage AuthN/Z, request
validation, rate limiting, SDK generation, direct AWS service backend,
use #APIGateway. If you want to add Lambda to an existing web app
behind ALB you can now just add it to the needed route.
(From: Dougal Ballantyne, the Head of Product for Amazon API Gateway)
API gateways usually are richer in functionality than Load balancers. In addition to load balancing, API gateways often capable to do the following:
Content based based routing (some calls to v1 and some calls to v2 and so on, based on certain criteria)
IAM related functionality (eg: access validation )
Security (eg: SSL offloading, DDOS attack prevention, security credentials translation - eg: translating particular type of token to another, etc)
Payload translation (eg: XML to Json, etc)
Additionally, API gateways may be available in appliance form - and appliances are usually of low-latency, far more secure, etc.
I am not aware of specific features of AWS API gateway, but the above ones are general features of any API gateway. Nevertheless, when you have an option to use either LB or API gateway to offer a service on internet, API gateway is usually a better option, unless there are specific reasons to choose otherwise.

AWS serverless architecture – Why should I use API gateway?

Here is my use case:
Static react frontend hosted on s3
Python backend on lambda conduction long running data analysis
Postgres database on rds
Backend and frontend communicate exclusively with JSON
Occasionally backend creates and stores powerpoint files in s3 bucket and then serves them up by sending s3 link to frontend
Convince me that it is worthwhile going through all the headaches of setting up API gateway to connect the frontend and backend rather than invoking lambda directly from the frontend!
Especially given the 29s timeout which is not long enough for my app meaning I need to implement asynchronous processing and add a whole other layer of aws architecture (messaging, queuing and polling with SNS and SQS) which increases cost, time and potential for problems. I understand there are some security concerns, is there no way to securely invoke a lambda function?
You are talking about invoking a lambda directly from JavaScript running on a client machine.
I believe the only way to do that would be embedding the AWS SDK for JavaScript in your react frontend. See:
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/browser-invoke-lambda-function-example.html
There are several security concerns with this, only some of which can be mitigated.
First off, you will need to hardcode AWS credentials in to your frontend for the world to see. The access those credentials have can be limited in scope, but be very careful to get this right, or otherwise you'll be paying for someone's cryptomining operation.
Assuming you only want certain people to upload files to a storage service you are paying for, you will need some form of authentication and authorisation. API Gateway doesn't really do authentication, but it can do authorisation, albeit by connecting to other AWS services like Cognito or Lambda (custom authorizers). You'll have to build this into your backend Lambda yourself. Absolutely doable and probably not much more effort than using a custom authorizer from the API Gateway.
The main issue with connecting to Lambda direct is that Lambda has the ability to scale rapidly, which can be an issue if someone tries to hit you with a denial of service attack. Lambda is cheap, but running 1000 concurrent instances 24 hours a day is going to add up.
API Gateway allows you rate limit per second/minute/hour/etc., Lambda only allows you to limit the number of concurrent instances at any given time. So if you were to set that limit at 1, an attacker could cause that 1 instance to run for 24 hours a day.