I read many people struggling connection their Lambda to their DynamoDB, because they live in a VPC. But my question is, why use a VPC at all?
VPC are meant to protect services with a direct connection to the outside world (AKA internet). Things like RDS for instance, which are just sitting ducks waiting to be queried by anyone knowing the URL, and therefore can be victim of DDoS, or zero day exploits that could bypass the credentials, amongst other things.
But, AWS Lambda and DynamoDB aren't such things, they don't have a direct connection to internet. Their access is protected by IAM credentials and are de-facto, secure for such DDoS/0Day exploits.
Hence the question, why use a VPC for Lambda/DynamoDB if they don't benefit from it but on the contrary make things more complicated to configure?
I don't see the benefits of using a VPC for either Lambda nor DynamoDB.
But maybe my understanding is wrong?
If your Lambda function only needs to connect to DynamoDB, then it would be wrong to place the Lambda function in a VPC.
If your Lambda function needs to access an EC2 instance or an RDS instance or some other service running inside the VPC, and also needs to connect to DynamoDB, then the Lambda function would have to run in the VPC and you would need to provide access to DynamoDB via a VPC Endpoint or a NAT Gateway.
Related
I am connecting to AWS DocumentDB from a Lambda function. In order to be able to do this I had to attach lambda to the default VPC (that's where DocumentDB cluster is running) and the default (public) subnets. But, this has caused my Lambda to timeout whenever trying make an outbound request, e.g. push message to SQS. This, I want to avoid.
So what is the recommended way of connecting to DocumentDB without loosing functionality that occurs when putting Lambda in the VPC? There's gotta be a simple solution.
Lambda functions in a VPC never get a public IP address. So if the function needs to access both VPC resources and other resources outside of the VPC the function has to be deployed only to private subnets with routes to a NAT Gateway.
Alternatively, if the only external resources you need to access are other AWS services, then you could add VPC Endpoints for those services to the VPC.
I'm new to AWS and struggling to understand how they've laid out their components, especially around networking & access.
In this case, I'm toying with an API GW and a "hello world" lambda. I made the lambda (no VPC) and hooked it up to an API GW, and now I have a publicly-accessible lambda. I didn't understand why the lambda was callable without being in a VPC, but I finally stumbled upon this explanation in the docs: https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html#vpc-internet
Seems weird to default open, but okay.
So, now I'm trying to close off the API via the networking-related config. So I created a VPC & private subnets (no IGW, NOT publicly accessible), and put the lambda in there. I felt confident it would no longer be accessible, 'cuz that's how VPC & networking works, yet the lambda is still publicly accessible! Why?
The API GW doesn't have access to this VPC, and in any case, this VPC doesn't have internet access. The way these components are interacting doesn't seem to make sense. What's going on here?
API Gateway allows you to create private endpoints as well as public ones. It sounds like you want a private endpoint.
My mistake was thinking lambdas can be put in a VPC. Rather, they can only be given access to a VPC:
Configuring a Lambda function to access resources in a VPC
You can configure a Lambda function to connect to private subnets in a virtual private cloud (VPC) in your AWS account.
https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html
So all my comments about VPC logic don't apply, and that's why my lambda is publicly accessible.
Furthermore, in case this is helpful to others:
I originally had the misconception that all AWS components were basically servers, and they all needed to be put in VPCs. This pattern does not apply to serverless components! (including S3, lambda, dynamoDB, etc) These operate in their own area, and the point of the component is to (mostly) abstract away things like networking details. Instead, they tend to control access via other methods, like IAM policies, security groups, "integrations", etc
Check for Internet Gateway, Nat gateway. Check for the actual VPC the LAMDA is connected to, sometimes having multiple can be confusing and lastly also check the VPC End point
I'm working on an serverless application the works with a database in RDS. For security reasons, both the application (Lambda) and the database are located in a private subnet in a VPC.
I also want to access AWS services from the application - for example, I would like to access secret manager to obtain database credentials, put an rule in EventBridge and use STS service.
I know that I can use VPC endpoints and deploy interface endpoint in my VPC for each service of interest.
My question is as follows - the sole reason that the application is in the private subnet is database access. Why shouldn't I just create another lambda, that is not my VPC and can access these services easily and for free and just invoke it from my main application?
What are the security risks? What am I missing?
Thanks
If I understand correctly, you would want to create another Lambda which runs outside of the VPC and be invoked by the Lambda which is inside the VPC.
Well you can certainly do that, but this also would require to have either a NAT gateway to have access to the outside Lambda or a VPC endpoint for the Lambda control-plane. Moreover, you will double pay for each separate Lambda invocations, and you also would want to keep an eye on the running time of the Lambdas.
can access these services easily and for free
Nothing is really free in AWS. You will have to pay for the ENI used by the VPC endpoint or for the NAT gateway. And also for the Lambda invocations.
What are the security risks?
Security-wise, you are not really missing anything.
I have a lambda function which runs every 15 minutes and saves some data in DynamoDB.
Now I want to secure the DynamoDB call made by my lambda so that the request does not go via the Internet, rather through Amazon internal network. There is no EC2 instance involved here though.
I have seen a few recommendations for using PrivateLink which binds the Dynamo to VPC endpoints so that calls made from EC2 instances always go via internal network bypassing Internet.
I was wondering such a configuration is possible for lamda calling DynamoDB since lamda itself does not run in any EC2 instance and is rather serverless?
The first thing I would say is that all of your traffic between Lambda and DynamoDB is signed and encrypted, so that's typically sufficient.
There are use cases, most typically compliance reasons, when this is not sufficient. In that case you can deploy the Lambda function into a VPC of your making and configure the VPC with a private VPC endpoint for DynamoDB. Typically, the VPC would be configured without an internet gateway or NAT so that it has no egress route to the public internet. Be aware that your Lambda function startup latency will be higher than usual, because each Lambda function environment needs to attach an ENI for access to the private endpoint.
See Configuring a Lambda Function to Access Resources in an Amazon VPC.
If you don't need to access resources in a VPC, AWS recommends not to run AWS Lambda functions in a VPC. From AWS Lambda Best Practices:
Don't put your Lambda function in a VPC unless you have to. There is no benefit outside of using this to access resources you cannot expose publicly, like a private Amazon Relational Database instance. Services like Amazon Elasticsearch Service can be secured over IAM with access policies, so exposing the endpoint publicly is safe and wouldn't require you to run your function in the VPC to secure it.
Running Lambda functions in VPC adds additionally complexity, which can negatively effect scalability and performance. Each Lambda function in a VPC needs an Elastic Network Interface (ENI). Provisioning ENI's is slow and the amount of ENI's you can have is limited, so when you scale up you can run into a shortage of ENI's, preventing your Lambda functions to scale up further.
This is one way to do it.
Step 1) Deploy your lambda inside VPC.
Step 2) Create VPC Endpoint to the DynamoDB.
This should help: https://aws.amazon.com/blogs/aws/new-vpc-endpoints-for-dynamodb/
Currently, building a serverless app that use DynamoDb and Elasticsearch is quite easy
Using serverless, you just declare everything in serverless.yml and you are good to go
Problems (quickly) arise when you need to use RDS or Elasticache because you have all kind of troubles with VPCs...which then simply defeats the serverless paradigm (developper should only focus on code)
The quickest solution is then to use a 3rd party solution (like RedisLabs or ClearDb)
My question is : why RDS and Elasticache require the VPC mode ? why aren't they usable directly like a 3rd party service ?
EDIT : as noted in the comments, you can place DynamoDB and Elasticsearch behind a VPC .
The problem then becomes : how to efficiently access them (RDS, elasticache, dynamodb, elasticsearch) from a lambda function ?
You need to configure the VPC of the lambda function to access all the other VPCs as described in https://docs.aws.amazon.com/lambda/latest/dg/vpc.html
Also, consider that the lamba also needs to access 3rd party services on the internet (ex : sendgrid, onesignal, ...) so I think that you still need a NAT somewhere
The distinction here is where the resources are actually running. Both Elasticsearch and DynamoDB are managed services running outside of your AWS account. RDS and Elasticache are different - they are launched into your AWS account, hence the need to tell AWS where you want to run them.
By the way, RDS doesn't require VPC. You can optionally run it in EC2-Classic or in EC2-VPC. And those are the only options to run compute on AWS (either in VPC, or not in VPC), so you are not actually being constrained here. You are simply being asked which you prefer.
The solution for access to private resources in your VPC (like RDS databases) is to configure the Lambda function to run in that VPC. Now the Lambda function is essentially inside the VPC, so it is constrained by the VPC's networking configuration. For the Lambda function to reach external websites, it needs a route to the public internet. Typically the way you do this in VPC is to configure an IGW and some form of NAT (roll-your-own, or managed NAT from AWS). This is all normal VPC behavior, and not specific to Lambda.
The best article I found on the subject : http://blog.brianz.bz/post/accessing-vpc-resources-with-lambda/
It explains nearly everything about VPC access from lambda (from what is a VPC, why you need it, how to access from lamdba and to configure from serverless)