Log requests/responses in API gateway through Labmbda proxy - amazon-web-services

I would like to log the complete requests+responses incl. body received on API gateway in an AWS Lambda proxy while passing on the requests for processing to a different server (as in reverse proxy the requests). Because the standard logging from API Gateway to CloudWatch truncates requests/responses after 1024 bytes, I cannot use this option.
So the processing would look like this:
Request -> API Gateway -> Lambda to log full request incl. body -> public API endpoint -> Response -> Lambda to log full response incl. body -> API Gateway -> Response
Is there a known solution for this scenario ?

You probably have a good reason for that, but make sure you aware that logging full request/response bodies may have a lot of undesired implications.
For example, if your service requires GDPR compliance, this is a huge issue. Also, it may greatly affect the performance and make you run into some issues related to quotas. Basically, it is usually not a good idea doing that.
Storing these logs on cloudwatch would be the easiest option for requests were that 1K limit is not an issue. If you have just a bunch of requests where this is not enough, you could consider treating them as an exception.
You could use S3 / DynamoDB / Elastic Search, depending on what you want to do with them, and there are tradeoffs as well.
S3 - This would allow to store very large requests/responses, but it can create a lot of fragmentation. You may end up with a lot of small files and also you need to some sort of index (probably storing the S3 key in cloudwatch logs). Searching can be somewhat painful in this case (although you may be able to use Athena depending on how do you store it).
DynamoDB - Easy to store, but you can run to a lot of quota limits if your API access is too frequent. You may need to bump your costs a lot to prevent them. Also, each record has a limit of 400Kb. I personally don't recommend this approach.
ElasticSearch - Default record size limit is 100Mb but this can be increased. It would make it easy to query this data later on.
I'd say ElasticSearch can be more appropriate for this case given the amount of information this thread has. Also, depending on the volume, some of these solutions would eventually require some publish-subscribe mechanism (ex. Kinesis) in between to handle burst limits and message grouping (if you are S3 for example, you may want to group multiple entries in a single file)

Related

Caching results of a lambda function

We are developing a serverless application. The application has "users" that get added, "groups" that have "permissions" on different "resources". To check if a user has permission to do an action on a resource, there would be some calculations we will need to do. (We are using DynamoDB)
Basically, before every action, we will need to check if the user has permission to do that particular action on the given resource. I was thinking we could have a lambda function that checks that from a cache and If not in the cache, hits the DB, does the calculation, writes in the cache, and returns.
What kind of cache would be best to use here? We are going to be calling this internally from the backend itself.
Is API gateway the way to go still?
How about elastic cache for this purpose? Can we use it without having to configure a VPC? We are trying not to have to use a VPC in our application.
Any better ways?
They are all good options!
Elasticache is designed for caching data. API Gateway can also cache results.
An alternative is to keep the data "inside" the AWS Lambda function by using global variables. The values will remain present the next time the Lambda function is invoked, so you could cache results and an expiry time. Note, however, that Lambda might launch multiple containers if the function is frequently run (even in parallel), or not run for some time. Therefore, you might end up with multiple caches.
I'd say the simplest option would be API Gateway's cache.
Where are those permissions map (user <-> resource) is stored?
This aws's blog post might be interesting (it's about caching in lambda execution environment's memory.), because you could use dynamodb's table for that.

What Amazon service should I use in order to serve merged files from an S3 bucket?

I need an HTTP web-service serving files (1-10GiB) being result of merging some smaller files in S3 bucket. Such a logic is pretty easy to implement, but I need a very high scalability, so would prefer to put it on cloud. What Amazon service will be most feasible for this particular case? Should I use AWS Lambda for that?
Unfortunately, you can't achieve that with lambda, since it only offer 512mb for strage, and you can't mount volumes.You will need EBS or EFS to download and process the data. Since you need scalability, I would sugest Fargate + EFS. Plain EC2 instances would do just fine, but you might lose some money because it can be tricky to provision the correct amount for your needs, and most of the time it is overprovisioned.
If you don't need to process the file in real time, you can use a single instance and use SQS to queue the jobs and save some money. In that scenario you could use lambda to trigger the jobs, and even start/kill the instance when it is not in use.
Merging files
It is possible to concatenate Amazon S3 files by using the UploadPartCopy:
Uploads a part by copying data from an existing object as data source.
However, the minimum allowable part size for a multipart upload is 5 MB.
Thus, if each of your parts is at least 5 MB, then this would be a way to concatenate files without downloading and re-uploading.
Streaming files
Alternatively, rather than creating new objects in Amazon S3, your endpoint could simply read each file in turn and stream the contents back to the requester. This could be done via API Gateway and AWS Lambda. Your AWS Lambda code would read each object from S3 and keep returning the contents until the last object has been processed.
First, let me clarify your goal: you want to have an endpoint, say https://my.example.com/retrieve that reads some set of files from S3 and combines them (say, as a ZIP)?
If yes, does whatever language/framework that you're using support chunked encoding for responses?
If yes, then it's certainly possible to do this without storing anything on disk: you read from one stream (the file coming from S3) and write to another (the response). I'm guessing you knew that already based on your comments to other answers.
However, based on your requirement of 1-10 GB of output, Lambda won't work because it has a limit of 6 MB for response payloads (and iirc that's after Base64 encoding).
So in the AWS world, that leaves you with an always-running server, either EC2 or ECS/EKS.
Unless you're doing some additional transformation along the way, this isn't going to require a lot of CPU, but if you expect high traffic it will require a lot of network bandwidth. Which to me says that you want to have a relatively large number of smallish compute units. Keep a baseline number of them always running, and scale based on network bandwidth.
Unfortunately, smallish EC2 instances in general have lower bandwidth, although the a1 family seems to be an exception to this. And Fargate doesn't publish bandwidth specs.
That said, I'd probably run on ECS with Fargate due to its simpler deployment model.
Beware: your biggest cost with this architecture will almost certainly be data transfer. And if you use a NAT, not only will you be paying for its data transfer, you'll also limit your bandwidth. I would at least consider running in a public subnet (with assigned public IPs).

How can I add ip-based rate limits with longer intervals on API Gateway?

I have an API Gateway endpoint that I would like to limit access to. For anonymous users, I would like to set both daily and monthly limits (based on IP address).
AWS WAF has the ability to set rate limits, but the interval for them is a fixed 5 minutes, which is not useful in this situation.
API Gateway has the ability to add usage plans with longer term rate quotas that would suit my needs, but unfortunately they seem to be based on API keys, and I don't see a way to do it by IP.
Is there a way to accomplish what I'm trying to do using AWS Services?
Is it maybe possible to use a usage plan and automatically generate an api key for each user who wants to access the api? Or is there some other solution?
Without more context on your specific use-case, or the architecture of your system, it is difficult to give a “best practice” answer.
Like most things tech, there are a few ways you could accomplish this. One way would be to use a combination of CloudWatch API logging, Lambda, DynamoDB (with Streams) and WAF.
At a high level (and regardless of this specific need) I’d protect my API using WAF and the AWS security automations quickstart, found here, and associate it with my API Gateway as guided in the docs here. Once my WAF is setup and associated with my API Gateway, I’d enable CloudWatch API logging for API Gateway, as discussed here. Now that I have things setup, I’d create two Lambdas.
The first will parse the CloudWatch API logs and write the data I’m interested in (IP address and request time) to a DynamoDB table. To avoid unnecessary storage costs, I’d set the TTL on the record I’m writing to my DynamoDB table to be twice whatever my analysis’s temporal metric is... ie If I’m looking to limit it to 1000 requests per 1 month, I’d set the TTL on my DynamoDB record to be 2 months. From there, my CloudWatch API log group will have a subscription filter that sends log data to this Lambda, as described here.
My second Lambda is going to be doing the actual analysis and handling what happens when my metric is exceeded. This Lambda is going to be triggered by the write event to my DynamoDB table, as described here. I can have this Lambda run whatever analysis I want, but I’m going to assume that I want to limit access to 1000 requests per month for a given IP. When the new DynamoDB item triggers my Lambda, the Lambda is going to query the DynamoDB table for all records that were created in the preceding month from that moment, and that contain the IP address. If the number of records returned is less than or equal to 1000, it is going to do nothing. If it exceeds 1000 then the Lambda is going to update the WAF WebACL, and specifically UpdateIPSet to reject traffic for that IP, and that’s it. Pretty simple.
With the above process I have near real-time monitoring of request to my API gateway, in a very efficient, cost-effective, scaleable manner in a way that can be deployed entirely Serverless.
This is just one way to handle this, there are definitely other ways you could accomplish this with say Kinesis and Elastic Search, or instead of logs you could analyze CloudTail events, or by using a third party solution that integrates with AWS, or something else.

AWS Elasticache Vs API Gateway Cache

I am new to Serverless architecture using AWS Lambda and still trying to figure out how some of the pieces fit together. I have converted my website from EC2 (React client, and node API) to a serverless architecture. The React Client is now using s3 static web hosting and the API has been converted over to use AWS Lambda and API Gateway.
In my previous implementation I was using redis as a cache for caching responses from other third party API's.
API Gateway has the option to enable a cache, but I have also looked into Elasticache as an option. They are both comparable in price with API Gateway cache being slightly costlier.
The one issue I have run into when trying to use Elasticache is that it needs to be running in a VPC and I can no longer call out to my third party API's.
I am wondering if there is any benefit to using one over the other? Right now the main purpose of my cache is to reduce requests to the API but that may change over time. Would it make sense to have a Lambda dedicated to checking Elasticache first to see if there is a value stored and if not triggering another Lambda to retrieve the information from the API or is this even possible. Or for my use case would API Gateway cache be the better option?
Or possibly a completely different solution all together. Its a bit of a shame that mainly everything else will qualify for the free tier but having some sort of cache will add around $15 a month.
I am still very new to this kind of setup so any kind of help or direction would be greatly appreciated. Thank you!
I am wondering if there is any benefit to using one over the other?
Apigateway internally uses Elasticache to support caching so functionally they both behave in same way. Advantage of using api gateway caching is that ApiGateway checks chache before invoking backend lambda, thus you save cost of lambda invocation for response which are served by cache.
Another difference will be that when you use api gateway cache , cache lookup time will not be counted towards "29s integration timeout" limit for cache miss cases.
Right now the main purpose of my cache is to reduce requests to the API but that may change over time.
I will suggest to make your decision about cache based on current use case. You might use completely new cache or different solution for other caching requirement.
Would it make sense to have a Lambda dedicated to checking Elasticache first to see if there is a value stored and if not triggering another Lambda to retrieve the information from the API or is this even possible. Or for my use case would API Gateway cache be the better option?
In general, I will not suggest to have additional lambda just for checking cache value ( just to avoid latency and aggravate lambda's cold start problem ). Either way, as mentioned above this way you will end up paying for lambda invokation even for requests which are being served by cache. If you use api gateway cache , cached requests will not even reach lambda.

How To Prevent AWS Lambda Abuse by 3rd-party apps

Very interested in getting hands-on with Serverless in 2018. Already looking to implement usage of AWS Lambda in several decentralized app projects. However, I don't yet understand how you can prevent abuse of your endpoint from a 3rd-party app (perhaps even a competitor), from driving up your usage costs.
I'm not talking about a DDoS, or where all the traffic is coming from a single IP, which can happen on any network, but specifically having a 3rd-party app's customers directly make the REST calls, which cause your usage costs to rise, because their app is piggy-backing on your "open" endpoints.
For example:
I wish to create an endpoint on AWS Lambda to give me the current price of Ethereum ETH/USD. What would prevent another (or every) dapp developer from using MY lambda endpoint and causing excessive billing charges to my account?
When you deploy an endpoint that is open to the world, you're opening it to be used, but also to be abused.
AWS provides services to avoid common abuse methods, such as AWS Shield, which mitigates against DDoS, etc., however, they do not know what is or is not abuse of your Lambda function, as you are asking.
If your Lambda function is private, then you should use one of the API gateway security mechanisms to prevent abuse:
IAM security
API key security
Custom security authorization
With one of these in place, your Lambda function can only by called by authorized users. Without one of these in place, there is no way to prevent the type of abuse you're concerned about.
Unlimited access to your public Lambda functions - either by bad actors, or by bad software developed by legitimate 3rd parties, can result in unwanted usage of billable corporate resources, and can degrade application performance. It is important to you consider ways of limiting and restricting access to your Lambda clients as part of your systems security design, to prevent runaway function invocations and uncontrolled costs.
Consider using the following approach to preventing execution "abuse" of your Lambda endpoint by 3rd party apps:
One factor you want to control is concurrency, or number of concurrent requests that are supported per account and per function. You are billed per request plus total memory allocation per request, so this is the unit you want to control. To prevent run away costs, you prevent run away executions - either by bad actors, or by bad software cause by legitimate 3rd parties.
From Managing Concurrency
The unit of scale for AWS Lambda is a concurrent execution (see
Understanding Scaling Behavior for more details). However, scaling
indefinitely is not desirable in all scenarios. For example, you may
want to control your concurrency for cost reasons, or to regulate how
long it takes you to process a batch of events, or to simply match it
with a downstream resource. To assist with this, Lambda provides a
concurrent execution limit control at both the account level and the
function level.
In addition to per account and per Lambda invocation limits, you can also control Lambda exposure by wrapping Lambda calls in an AWS API Gateway, and Create and Use API Gateway Usage Plans:
After you create, test, and deploy your APIs, you can use API Gateway
usage plans to extend them as product offerings for your customers.
You can provide usage plans to allow specified customers to access
selected APIs at agreed-upon request rates and quotas that can meet
their business requirements and budget constraints.
What Is a Usage Plan? A usage plan prescribes who can access one or
more deployed API stages— and also how much and how fast the caller
can access the APIs. The plan uses an API key to identify an API
client and meters access to an API stage with the configurable
throttling and quota limits that are enforced on individual client API
keys.
The throttling prescribes the request rate limits that are applied to
each API key. The quotas are the maximum number of requests with a
given API key submitted within a specified time interval. You can
configure individual API methods to require API key authorization
based on usage plan configuration. An API stage is identified by an
API identifier and a stage name.
Using API Gateway Limits to create Gateway Usage Plans per customer, you can control API and Lambda access prevent uncontrolled account billing.
#Matt answer is correct, yet incomplete.
Adding a security layer is a necessary step towards security, but doesn't protect you from authenticated callers, as #Rodrigo's answer states.
I actually just encountered - and solved - this issue on one of my lambda, thanks to this article: https://itnext.io/the-everything-guide-to-lambda-throttling-reserved-concurrency-and-execution-limits-d64f144129e5
Basically, I added a single line on my serverless.yml file, in my function that gets called by the said authirized 3rd party:
reservedConcurrency: 1
And here goes the whole function:
refresh-cache:
handler: src/functions/refresh-cache.refreshCache
# XXX Ensures the lambda always has one slot available, and never use more than one lambda instance at once.
# Avoids GraphCMS webhooks to abuse our lambda (GCMS will trigger the webhook once per create/update/delete operation)
# This makes sure only one instance of that lambda can run at once, to avoid refreshing the cache with parallel runs
# Avoid spawning tons of API calls (most of them would timeout anyway, around 80%)
# See https://itnext.io/the-everything-guide-to-lambda-throttling-reserved-concurrency-and-execution-limits-d64f144129e5
reservedConcurrency: 1
events:
- http:
method: POST
path: /refresh-cache
cors: true
The refresh-cache lambda was invoked by a webhook triggered by a third party service when any data change. When importing a dataset, it would for instance trigger as much as 100 calls to refresh-cache. This behaviour was completely spamming my API, which in turn was running requests to other services in order to perform a cache invalidation.
Adding this single line improved the situation a lot, because only one instance of the lambda was running at once (no concurrent run), the number of calls was divided by ~10, instead of 50 calls to refresh-cache, it only triggered 3-4, and all those call worked (200 instead of 500 due to timeout issue).
Overall, pretty good. Not yet perfect for my workflow, but a step forward.
Not related, but I used https://epsagon.com/ which tremendously helped me figuring out what was happening on AWS Lambda. Here is what I got:
Before applying reservedConcurrency limit to the lambda:
You can see that most calls fail with timeout (30000ms), only the few first succeed because the lambda isn't overloaded yet.
After applying reservedConcurrency limit to the lambda:
You can see that all calls succeed, and they are much faster. No timeout.
Saves both money, and time.
Using reservedConcurrency is not the only way to deal with this issue, there are many other, as #Rodrigo stated in his answer. But it's a working one, that may fit in your workflow. It's applied on the Lambda level, not on API Gateway (if I understand the docs correctly).