AWS serverless architecture – Why should I use API gateway? - amazon-web-services

Here is my use case:
Static react frontend hosted on s3
Python backend on lambda conduction long running data analysis
Postgres database on rds
Backend and frontend communicate exclusively with JSON
Occasionally backend creates and stores powerpoint files in s3 bucket and then serves them up by sending s3 link to frontend
Convince me that it is worthwhile going through all the headaches of setting up API gateway to connect the frontend and backend rather than invoking lambda directly from the frontend!
Especially given the 29s timeout which is not long enough for my app meaning I need to implement asynchronous processing and add a whole other layer of aws architecture (messaging, queuing and polling with SNS and SQS) which increases cost, time and potential for problems. I understand there are some security concerns, is there no way to securely invoke a lambda function?

You are talking about invoking a lambda directly from JavaScript running on a client machine.
I believe the only way to do that would be embedding the AWS SDK for JavaScript in your react frontend. See:
https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/browser-invoke-lambda-function-example.html
There are several security concerns with this, only some of which can be mitigated.
First off, you will need to hardcode AWS credentials in to your frontend for the world to see. The access those credentials have can be limited in scope, but be very careful to get this right, or otherwise you'll be paying for someone's cryptomining operation.
Assuming you only want certain people to upload files to a storage service you are paying for, you will need some form of authentication and authorisation. API Gateway doesn't really do authentication, but it can do authorisation, albeit by connecting to other AWS services like Cognito or Lambda (custom authorizers). You'll have to build this into your backend Lambda yourself. Absolutely doable and probably not much more effort than using a custom authorizer from the API Gateway.
The main issue with connecting to Lambda direct is that Lambda has the ability to scale rapidly, which can be an issue if someone tries to hit you with a denial of service attack. Lambda is cheap, but running 1000 concurrent instances 24 hours a day is going to add up.
API Gateway allows you rate limit per second/minute/hour/etc., Lambda only allows you to limit the number of concurrent instances at any given time. So if you were to set that limit at 1, an attacker could cause that 1 instance to run for 24 hours a day.

Related

Restricting AWS Secrect for Single AWS Lambda

Experts,
I am using AWS Api gateways as a proxy over AWS Lambda for my client calls. API Gateway has a timeout of 29000 milliseconds which is becoming an issue for one of our Lambda call requires around 2 minutes to complete.
Memory is not an issue - the operation is a bit time consuming.
One of the dirty but quickest ways to solve this issue to call the lambda directly from the client app skipping the AWS API gateway however I have to then hardcode the access key and secret in the client app.
As we are struggling with this issue on production the quickest and dirty solution seems to be the way forward for now and then later on replaced with WebSockets.
Still, to avoid major security issue I was thinking about whether can we restrict the access key and secret to only 1 lambda function

How to display data from DynamoDB on an AWS S3 hosted static webpage

I have a system which uses AWS IoT to send data to a DynamoDB table. This is simple weather data and stores time, temperature, humidity and pressure. So far that side is working well.
I want to display data from this table on a webpage hosted on S3. In its most simple form this would just display the latest row of data. I had though that this would be a case of a simple client-side javascript to query the database, but looking on Amazon it gets quite complicated with Lambda functions called through API gateway using IAM to certify.
Is there a simpler way to go about this? Data should be publicly readable, and non-writeable, so I thought should be easier than what I have read so far.
Please have a look at the Simple Web Service CDK Pattern. It helps you create a simple end-to-end service using API Gateway, a Lambda function, and access to a DynamoDB table with just a few lines of code. It is available in multiple programming languages.
As a general note: Whenever you want to provide dynamic content, you need some kind of application that takes care of it. An API Gateway backed by an AWS Lambda function is no more complicated than running a web server with all the undifferentiated heavy lifting like network configuration, firewall setup, OS patching, and maintenance. And proper handling of identity and access control needs to be done in any case.
If you really want to just display the latest row, and you prefer to keep you webpage as static as possible, I would consider just writing out the latest row of dynamodb to a simple json file using whatever backend process you want, then that file can be consumed by your front end application without having to worry about IAM Credentials or even the AWS JS SDK - keep it as simple and lightweight as possible.
No need to repeatedly hit your dynamodb to pull back the same data for each page load either - which should also save you some money in the long run.
There is a in-browser JavaScript SDK. It allows the JavaScript on your web page to make calls directly to DynamoDB, without having to make API calls through API Gateway that trigger a Lambda function that itself makes calls to DynamoDB on your behalf.
The main consideration here is how to authenticate your web browser client. It cannot make DynamoDB API calls without AWS credentials. I'm assuming that you have no server-side component that can return AWS credentials to the client.
You could embed read-only, minimally-permissioned AWS credentials in the page itself - this is definitely not a best practice, but it's simple and in some cases might be acceptable. Other options include requiring the web site user to authenticate via Web Identity Federation, which would dynamically yield a set of usable, time-limited AWS credentials. Another, more sophisticated option, would be AWS Amplify which supports both authenticated and unauthenticated clients. This is generally preferable to hard-coding (read-only) AWS credentials in a web page but is a little more complex to set up.
There is also a blog post: Dynamic Websites Using the AWS SDK for JavaScript in the Browser.
In all these scenarios, the page itself would make API calls directly to DynamoDB, and you should ensure that the IAM role associated with the credentials is heavily restricted (just the relevant DynamoDB table(s) and just the necessary query/get permissions).

Flask web application deployment via AWS Lambda

I am very new to AWS Lambda and am struggling to understand its functionalities based on many examples I found online (+reading endless documentations). I understand the primary goal of using such service is to implement a serverless architecture that is cost and probably effort-efficient by allowing Lambda and the API Gateway to take on the role of managing your server (so serverless does not mean you don't use a server, but the architecture takes care of things for you). I organized my research into two general approaches taken by developers to deploy a Flask web application to Lambda:
Deploy the entire application to Lambda using zappa-and zappa configurations (json file) will be the API Gateway authentication.
Deploy only the function, the parsing blackbox that transforms user input to a form the backend endpoint is expecting (and backwards as well) -> Grab a proxy url from the API Gateway that configures Lambda proxy -> Have a separate application program that uses the url
(and then there's 3, which does not use the API Gateway and invokes the Lambda function in the application itself-but I really want to get a hands on experience using the API Gateway)
Here are the questions I have for each of the above two approaches:
For 1, I don't understand how Lambda is calling the functions in the Flask application. According to my understanding, Lambda only calls functions that have the parameters event and context-or are the url calls (url formulated by the API Gateway) actually the events calling the separate functions in the Flask application, thus enabling Lambda to function as a 'serverless' environment-this doesn't make sense to me because event, in most of the examples I analyzed, is the user input data. That means some of the functions in the application have no event and some do, which means Lambda somehow magically figures out what to do with different function calls?
I also know that Lambda does have limited capacity, so is this the best way? It seems to be the standard way of deploying a web application on Lambda.
For 2, I understand the steps leading to incorporating the API Gateway urls in the Flask application. The Flask application therefore will use the url to access the Lambda function and have HTTP endpoints for user access. HOWEVER, that means, if I have the Flask application on my local computer, the application will only be hosted when I run the application on my computer-I would like it to have persistent public access (hopefully). I read about AWS Cloud9-would this be a good solution? Where should I deploy the application itself to optimize this architecture-without using services that take away the architecture's severless-ness (like an EC2 instance maybe OR on S3, where I would be putting my frontend html files, and host a website)? Also, going back to 1 (sorry I am trying to organize my ideas in a coherent way, and it's not working too well), will the application run consistently as long as I leave the API Gateway endpoint open?
I don't know what's the best practice for deploying a Flask application using AWS Lambda and the API Gateway but based on my findings, the above two are most frequently used. It would be really helpful if you could answer my questions so I could actually start playing with AWS Lambda! Thank you! (+I did read all the Amazon documentations, and these are the final-remaining questions I have before I start experimenting:))
Zappa has its own code to handle requests and make them compatible with the "Flask" format. Keep in mind that you aren't really using Flask as intended in either cases when using Lambda. Lambdas are invoked only when calls are made, flask usually keeps running looking for requests. But the continuously running part is handled by the API Gateway here. Zappa essentially creates a single ANY request on API gateway, this request is passed to your lambda handler which interprets it and uses it to invoke your flask function.
If you you are building API Gateway + Lambda you don't need to use Flask. It would be much easier to simply make a function that is invoked by the parameters passed to lambda handler by the API Gateway. The front end application you can host on S3 (if it is static or Angular).
I would say the best practice here would be to not use Flask and go with the API Gateway + Lambda option. This lets you put custom security and checks on your APIs as well as making the application a lot more stable as every request has its own lambda.

Design a high available API service with database backend on AWS

I am working on an API layer which serves requests from a backend database.
So the requirements are:
Repopulate the whole table without downtime for API service: A main requirement for the API is that we should be able to re-populate the tables( 2to 3 tables, structured csv like data) in backend database periodically (bi-weekly or monthly), but the API service should not go down.
low latency globally in the order of 100s of millisecond
scalability with requests per second
rate limit clients
also switch to previous versions of the tables in the backend in case of issues.
My questions are about what kinds of AWS database that i can use and other AWS components which can achieve the above goals.
If you want a secure, low latency global API, then I would go with an edge optimized API Gateway API.
Documentation is here API GW limits regarding the maximum requests per second.
You can rate limit clients using API GW. Also, you can have different stages in API GW that correspond to different aliases in lambda. Lambda would be your serverless compute layer that would handle your API GW requests and in turn query your database. Using versioning and aliasing in lambda would allow you to switch to different database tables. Given that you are planning to use csv like data, you could go with RDS and use the Aurora engine, which is compliant with MySQL and PostgreSQL, and is an extremely cost effective option.
As some additional information, you should use lambda proxy integration between your API GW APIs and your lambda functions. This allows you to enable Identity and Access Management (IAM) for your APIs.
Documentation on Lambda proxy integration: Lambda proxy integration
Here is some documentation on Lambda: AWS Lambda versioning and aliases
Here is some documentation on RDS Aurora: AWS RDS Aurora

Prevent AWS Lambda flooding

I'm considering about moving my service from a VPS to AWS Lambda + DynamoDB to use it as a FaaS, because it's basically 2 API GET calls that fetch info from the database and serve it, and the normal use of those API calls are really rare (about 50 times a week)
But it makes me wonder... As I can't setup a limit on how many calls I want to serve each month, some attacker could theoretically flood my service by calling it a couple thousands times a day and make my AWS bill extremely expensive. Setting up a limit per month wouldn't be a nice idea either, because the attacker could flood the first day and I won't have more requests to serve. The ideal thing would be to set up a limit on request rate per client.
Anyone knows how could I protect it? I've seen that AWS also offers a Firewall, but that's for CloudFront. Isn't there any way to make it work with Lambda directly?
You can put AWS CloudFront in front API Gateway and Lambda so that, the traffic will be served to outside through CloudFront.
In addition by configuring AWS WAF with rate base blocking, it is possible to block high frequencies of access by attackers.
However when configuring AWS CloudFront in front of API Gateway and Lambda, you also need to restrict direct access to API Gateway (Since API Gateway will be publicly accessible by default). This can be achieved in following ways.
Enable API Keys for API Gateway and use the API Key in AWS CloudFront Headers in the Origin.
Use a Token Header and Verify it using a Custom Authorizer Lambda function.
Two options spring to mind:
place API Gateway in front of Lambda so that API requests
have to be authenticated. API Gateway also has built-in throttles and other useful features.
invoke the Lambda directly, which will require the client
invoking the Lambda to have the relevant IAM credentials.