How to Access documentDB from Lambda#Edge function? - amazon-web-services

I am trying to set up an event trigger lambda#Edge function from cloudFront.
This function needs to access the database and replace the url's metadata before distributing out to users.
Issues I am facing:
My DocumentDB is placed in a VPC private subnet. Can't be accessed outside the VPC.
My Lambda edge function can't connect to my VPC since they are both in different region.
The method I had in mind is to create an API in my web server(public subnet) for my lambda function to call, but this seems like not a very efficient method
Appreciate If you can give me some advice or an alternative way for implementation.
Thanks in Advance

Lambda#Edge has a few limitations you can read about here.
Among them is:
You can’t configure your Lambda function to access resources inside your VPC.
That means the VPC being in another region is not your problem, you just can't place a Lambda # Edge function in any VPC.
The only solution I can think of is making your DocumentDB available publicly on the internet, which doesn't seem like a great idea. You might be able to create a security group that only allows access from the CloudFront IP-Ranges although I couldn't find out if Lambda#Edge actually uses the same ranges :/
Generally I'd avoid putting too much business logic in Lambda#Edge functions - keep in mind they're run on every request (or at the very least every request to the origin) and increase the latency for these requests. Particularly network requests are expensive in terms of time, more so if you communicate across continents to your primary region with the database.
If the information you need to update the URLs metadata is fairly static, I'd try to serialize it and distribute it in the lambda package - reading from local storage is considerably cheaper and faster.

Related

Caching results of a lambda function

We are developing a serverless application. The application has "users" that get added, "groups" that have "permissions" on different "resources". To check if a user has permission to do an action on a resource, there would be some calculations we will need to do. (We are using DynamoDB)
Basically, before every action, we will need to check if the user has permission to do that particular action on the given resource. I was thinking we could have a lambda function that checks that from a cache and If not in the cache, hits the DB, does the calculation, writes in the cache, and returns.
What kind of cache would be best to use here? We are going to be calling this internally from the backend itself.
Is API gateway the way to go still?
How about elastic cache for this purpose? Can we use it without having to configure a VPC? We are trying not to have to use a VPC in our application.
Any better ways?
They are all good options!
Elasticache is designed for caching data. API Gateway can also cache results.
An alternative is to keep the data "inside" the AWS Lambda function by using global variables. The values will remain present the next time the Lambda function is invoked, so you could cache results and an expiry time. Note, however, that Lambda might launch multiple containers if the function is frequently run (even in parallel), or not run for some time. Therefore, you might end up with multiple caches.
I'd say the simplest option would be API Gateway's cache.
Where are those permissions map (user <-> resource) is stored?
This aws's blog post might be interesting (it's about caching in lambda execution environment's memory.), because you could use dynamodb's table for that.

Organising stacks and shared resources in AWS CloudFromation and Serverless

I have an architectural question about the design and organisation of AWS Serverless resources using CloudFormation.
Currently I have multiple stack organised by the domain specific purpose and this works well. Most of the stack that contain Lambdas have to transformed using Serverless (using SAM for all). The async communication is facilitated using a combination of EventBridge and S3+Events and works well. The issue I have is with synchronous communication.
I don't want to reference Lambdas from other stacks using their exported names from other stacks and invoke them directly as this causes issues with updating and versions (if output exports are referenced in other stacks, I cannot change the resource unless the reference is removed first, not ideal for CI/CD and keeping the concerns separate).
I have been using API Gateway as an abstraction but that feels rather heavy handed. It is nice to have that separation but having to have domain and DNS resolving + having the API GW exposed externally doesn't feel right. Maybe there is a better way to configure API GW to be internal only. If you had success with this, could you please point me in the direction?
Is there a better way to abstract invocation of Lambda functions from different stacks in a synchronous way? (Common template patterns for CF or something along those lines?)
I see two questions:
Alternatives for Synchronous Lambda Functions with API Gateway .
Api Gateway is one easy way, with IAM Authentication to make it secure. HTTP Api is much simplified and cheaper option compared to REST APIs. We can choose Private Api rather than a Regional/Edge, which is not exposed outside VPC to make it even move secure.
we can have a private ALB with target as Lambda functions, for a simple use case that doesn't need any API gateway features.(this will cost some amount every month)
We can always call lambdas directly with AWS SDK invoke.
Alternatives to share resources between templates.
Exporting and Importing will be bit of problem if we need to delete and recreate the resource, shouldn't be a problem if we are just updating it though.
We can always store the Arn of the Lambda function in an SSM parameter in source template and resolve the value of the Arn from SSM parameter in destination template. This is completely decoupled. This is better than simply hard coding the value of Arn.

AWS Lambda with Elasticache Redis without NAT

I am going to mention my needs and what I have currently in place so bear with me. Firstly, a lambda function say F1 which when invoked will get 100 links from a site. Most of these links say about 95 are the same as when F1 was invoked the previous time, so further processing must be done with only those 5 "new" links. One solution was to write to a Dynamodb database the links that are processed already and each time the F1 is invoked, query the database and skip those links. But I found that the "database read" although in milliseconds is doubling up lambda runtime and this can add up especially if F1 is called frequently and if there are say a million processed links. So I decided to use Elasticache with Redis.
I quickly found that Redis can be accessed only when F1 runs on the same VPC and because F1 needs access to the internet you need NAT. (I don't know much about networking) So I followed the guidelines and set up VPC and NAT and got everything to work. I was delighted with performance improvements, almost reduced the expected lambda cost in half to 30$ per month. But then I found that NAT is not included in the free tier and I have to pay almost 30$ per month just for NAT. This is not ideal for me as this project can be in development for months and I feel like I am paying the same amount as compute just for internet access.
I would like to know if I am making any fundamental mistakes. Am I using the Elasticache in the right way? Is there a better way to access both Redis and the internet? Is there any way to structure my stack differently so that I retain the performance without essentially paying twice the amount after free tier ends. Maybe add another lambda function? I don't have any ideas. Any minute improvements are much appreciated. Thank you.
There are many ways to accomplish this, and all of them have some trade-offs. A few other ideas for you to consider:
Run F1 without a VPC. It will have connectivity directly to DynamoDB without need for a NAT, saving you the cost of the NAT gateway.
Run your function on a micro EC2 instance rather than in Lambda, and persist your link lookups to some file on local disk, or even a local Redis. With all the Serverless hype, I think people sometimes overestimate the difficulty (and stability) of simply running an OS. It's not that hard to manage, it's easy to set up backups, and may be an option depending upon your availability requirements and other needs.
Save your link data to S3 and set up a VPC endpoint to S3 gateway endpoint. Not sure if it will be fast enough for your needs.

AWS Elasticache Vs API Gateway Cache

I am new to Serverless architecture using AWS Lambda and still trying to figure out how some of the pieces fit together. I have converted my website from EC2 (React client, and node API) to a serverless architecture. The React Client is now using s3 static web hosting and the API has been converted over to use AWS Lambda and API Gateway.
In my previous implementation I was using redis as a cache for caching responses from other third party API's.
API Gateway has the option to enable a cache, but I have also looked into Elasticache as an option. They are both comparable in price with API Gateway cache being slightly costlier.
The one issue I have run into when trying to use Elasticache is that it needs to be running in a VPC and I can no longer call out to my third party API's.
I am wondering if there is any benefit to using one over the other? Right now the main purpose of my cache is to reduce requests to the API but that may change over time. Would it make sense to have a Lambda dedicated to checking Elasticache first to see if there is a value stored and if not triggering another Lambda to retrieve the information from the API or is this even possible. Or for my use case would API Gateway cache be the better option?
Or possibly a completely different solution all together. Its a bit of a shame that mainly everything else will qualify for the free tier but having some sort of cache will add around $15 a month.
I am still very new to this kind of setup so any kind of help or direction would be greatly appreciated. Thank you!
I am wondering if there is any benefit to using one over the other?
Apigateway internally uses Elasticache to support caching so functionally they both behave in same way. Advantage of using api gateway caching is that ApiGateway checks chache before invoking backend lambda, thus you save cost of lambda invocation for response which are served by cache.
Another difference will be that when you use api gateway cache , cache lookup time will not be counted towards "29s integration timeout" limit for cache miss cases.
Right now the main purpose of my cache is to reduce requests to the API but that may change over time.
I will suggest to make your decision about cache based on current use case. You might use completely new cache or different solution for other caching requirement.
Would it make sense to have a Lambda dedicated to checking Elasticache first to see if there is a value stored and if not triggering another Lambda to retrieve the information from the API or is this even possible. Or for my use case would API Gateway cache be the better option?
In general, I will not suggest to have additional lambda just for checking cache value ( just to avoid latency and aggravate lambda's cold start problem ). Either way, as mentioned above this way you will end up paying for lambda invokation even for requests which are being served by cache. If you use api gateway cache , cached requests will not even reach lambda.

What is the best practice for having several AWS Lambdas consuming an external(EC2) Postgres

I've +100 endpoints at AWS API each one calling a specific Lambda script, that connects to a external Postgres DB, that fetch the data and return it.
My question is, if I do that, 100 Lambdas scripts, will need to have in each one the connections details as environment variables?
That is probably not the way to do it... what is the best way to manage this in a single place, like Global Environment Variables that can be accessed to all the lambdas?
another thing is, every Lambda open a connection is very bad too, is there a way to manage a connection pool, that can be shared between other lambdas?
Is there anyway to group them?
Thanks!
The recommended way of storing secret configuration for access by AWS services like Lambda functions is Secrets Manager. You store the connection details as a secret, and in each Lambda function code you load the value from the Secrets Manager service as and when you require it.
Lambda functions are deliberately isolated, and so there is no way to share connections or other resources between different functions. You can sometimes share resources between successive invocations of the same function, but you can't rely on this and for something like a database connection it's not even a good usecase. Instead, you should be trying to open and close the connection as compactly as possible, so that other function invocations can have access to the databases's connection pool.