I am using a lambda function in a VPC to connect to an RDS instance in the same VPC. I am considering removing the lambda from the VPC to massively reduce the cold-start time but I want to keep my RDS instance in the VPC.
Can anyone foresee major problems with making the lambda function use an SSH tunnel to connect to a bastion instance within the VPC and subsequently to the RDS instance? Or something similar with a VPN?
There will obviously be some over-head as the traffic has an extra 'jump' so to speak, but would it be significant enough to make this approach non-feasible? Or is the only current approach to keep the Lambda in the same VPC and try to keep and few invocations running?
I also pay for a NAT gateway so my Lambda in a VPC can access the internet. If I can get it out of the VPC by using an SSH tunnel to connect to the RDS instance it will also simplify my architecture here & reduce my operating costs.
Cold starts because of Lambda's in VPC are a big issue, especially when you want to use a relational database. Luckily, AWS has acknowledged this issue and there is hope on the horizon;
Aurora Serverless now supports the Data API that allows to run SQL queries using the AWS SDK over https. This is released on Nov 20 ('18) and is in beta and only in us-east-1, but it's a start.
During re:Invent '18 an improvement on the VPC-cold-start issue was announced (but no release date yet) in which they basically create an ENI for a group of Lambda's and have that ENI ready even if there are no Lambda's warm.
Related
I read many people struggling connection their Lambda to their DynamoDB, because they live in a VPC. But my question is, why use a VPC at all?
VPC are meant to protect services with a direct connection to the outside world (AKA internet). Things like RDS for instance, which are just sitting ducks waiting to be queried by anyone knowing the URL, and therefore can be victim of DDoS, or zero day exploits that could bypass the credentials, amongst other things.
But, AWS Lambda and DynamoDB aren't such things, they don't have a direct connection to internet. Their access is protected by IAM credentials and are de-facto, secure for such DDoS/0Day exploits.
Hence the question, why use a VPC for Lambda/DynamoDB if they don't benefit from it but on the contrary make things more complicated to configure?
I don't see the benefits of using a VPC for either Lambda nor DynamoDB.
But maybe my understanding is wrong?
If your Lambda function only needs to connect to DynamoDB, then it would be wrong to place the Lambda function in a VPC.
If your Lambda function needs to access an EC2 instance or an RDS instance or some other service running inside the VPC, and also needs to connect to DynamoDB, then the Lambda function would have to run in the VPC and you would need to provide access to DynamoDB via a VPC Endpoint or a NAT Gateway.
I have a lambda function which runs every 15 minutes and saves some data in DynamoDB.
Now I want to secure the DynamoDB call made by my lambda so that the request does not go via the Internet, rather through Amazon internal network. There is no EC2 instance involved here though.
I have seen a few recommendations for using PrivateLink which binds the Dynamo to VPC endpoints so that calls made from EC2 instances always go via internal network bypassing Internet.
I was wondering such a configuration is possible for lamda calling DynamoDB since lamda itself does not run in any EC2 instance and is rather serverless?
The first thing I would say is that all of your traffic between Lambda and DynamoDB is signed and encrypted, so that's typically sufficient.
There are use cases, most typically compliance reasons, when this is not sufficient. In that case you can deploy the Lambda function into a VPC of your making and configure the VPC with a private VPC endpoint for DynamoDB. Typically, the VPC would be configured without an internet gateway or NAT so that it has no egress route to the public internet. Be aware that your Lambda function startup latency will be higher than usual, because each Lambda function environment needs to attach an ENI for access to the private endpoint.
See Configuring a Lambda Function to Access Resources in an Amazon VPC.
If you don't need to access resources in a VPC, AWS recommends not to run AWS Lambda functions in a VPC. From AWS Lambda Best Practices:
Don't put your Lambda function in a VPC unless you have to. There is no benefit outside of using this to access resources you cannot expose publicly, like a private Amazon Relational Database instance. Services like Amazon Elasticsearch Service can be secured over IAM with access policies, so exposing the endpoint publicly is safe and wouldn't require you to run your function in the VPC to secure it.
Running Lambda functions in VPC adds additionally complexity, which can negatively effect scalability and performance. Each Lambda function in a VPC needs an Elastic Network Interface (ENI). Provisioning ENI's is slow and the amount of ENI's you can have is limited, so when you scale up you can run into a shortage of ENI's, preventing your Lambda functions to scale up further.
This is one way to do it.
Step 1) Deploy your lambda inside VPC.
Step 2) Create VPC Endpoint to the DynamoDB.
This should help: https://aws.amazon.com/blogs/aws/new-vpc-endpoints-for-dynamodb/
Currently, building a serverless app that use DynamoDb and Elasticsearch is quite easy
Using serverless, you just declare everything in serverless.yml and you are good to go
Problems (quickly) arise when you need to use RDS or Elasticache because you have all kind of troubles with VPCs...which then simply defeats the serverless paradigm (developper should only focus on code)
The quickest solution is then to use a 3rd party solution (like RedisLabs or ClearDb)
My question is : why RDS and Elasticache require the VPC mode ? why aren't they usable directly like a 3rd party service ?
EDIT : as noted in the comments, you can place DynamoDB and Elasticsearch behind a VPC .
The problem then becomes : how to efficiently access them (RDS, elasticache, dynamodb, elasticsearch) from a lambda function ?
You need to configure the VPC of the lambda function to access all the other VPCs as described in https://docs.aws.amazon.com/lambda/latest/dg/vpc.html
Also, consider that the lamba also needs to access 3rd party services on the internet (ex : sendgrid, onesignal, ...) so I think that you still need a NAT somewhere
The distinction here is where the resources are actually running. Both Elasticsearch and DynamoDB are managed services running outside of your AWS account. RDS and Elasticache are different - they are launched into your AWS account, hence the need to tell AWS where you want to run them.
By the way, RDS doesn't require VPC. You can optionally run it in EC2-Classic or in EC2-VPC. And those are the only options to run compute on AWS (either in VPC, or not in VPC), so you are not actually being constrained here. You are simply being asked which you prefer.
The solution for access to private resources in your VPC (like RDS databases) is to configure the Lambda function to run in that VPC. Now the Lambda function is essentially inside the VPC, so it is constrained by the VPC's networking configuration. For the Lambda function to reach external websites, it needs a route to the public internet. Typically the way you do this in VPC is to configure an IGW and some form of NAT (roll-your-own, or managed NAT from AWS). This is all normal VPC behavior, and not specific to Lambda.
The best article I found on the subject : http://blog.brianz.bz/post/accessing-vpc-resources-with-lambda/
It explains nearly everything about VPC access from lambda (from what is a VPC, why you need it, how to access from lamdba and to configure from serverless)
I have an AWS Lambda function that makes use of an ElastiCache Redis cluster.
Since the Redis cluster is "locked" in a VPC, the Lambda function must reside in that VPC too.
For some reason, if the Lambda is allocated an IP of a public subnet, which has an Internet gateway - it still cannot make connections to the outside (the internet), thus making it impossible to use Kinesis.
For that, they suggest using a NAT gateway which lets the Lambda connect to the outside.
Basically, this works for me - but my issue is the money.
This solution is expensive for large amount of data transfers and I'm looking for some way to make it cheaper.
For a small POC that I've made, I paid ~$10.
This is too much for ~30GB as my production pipeline will run hundreds of gigabytes / month.
How do you suggest I let the Lambda function connect the outside (specifically Kinesis) without using a NAT gateway?
Thank you!
without using a NAT gateway?
Use a NAT instance.
You have to have one of these two things for anything in VPC to access the Internet from a private IP address.
NAT instances were exactly how this was always done in VPC, until the relatively new NAT Gateway service was rolled out.
You can also use a NAT gateway, which is a managed NAT service that provides better availability, higher bandwidth, and requires less administrative effort. For common use cases, we recommend that you use a NAT gateway rather than a NAT instance.
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_NAT_Instance.html
Sure, it's easier, but it costs more. A lot more. The most significant difference in this case is that with a NAT instance, you pay a flat rate for use of the hardware, which could be an inexpensive t2.nano, $5/mo.
The NAT Gateway service is a high powered solution with nearly infinite scaling capacity, and is priced accordingly. A NAT instance is only as good as the hardware you choose to run it on, but I find t2.nano and t2.micro quite adequate for workloads requiring less than 250 Mbit/s of Internet connectivity.
Use the link, above, to learn more.
Lambda function instances will never be assigned a public IP address, regardless of the type of VPC subnet you place them in. A NAT gateway is the only solution to provide a Lambda function inside a VPC with access to resources that reside outside the VPC (like Kinesis).
If that isn't going to work for you due to cost, you might look into running a Redis server on an EC2 instance with an Elastic IP, which would allow the Lambda function to connect without being inside the VPC. A similar alternative would be to use RedisLabs instead of ElastiCache.
Is it possible to connect from an AWS Lambda function to a Redis ElastiCache cluster?
I can't figure out if it's a configuration problem or it's simply not possible.
PS: I made a test from an EC2 instance and I can connect to the Redis node. Also the Lambda function and the Redis node are in the same region.
UPDATE (09 Oct 2015):
Amazon announced VPC for AWS Lambda functions. Details here
This means we can now access any resource in AWS behind VPC security group, including ElastiCache and RDS machines.
UPDATE (11 Feb 2016):
Amazon launched VPC for AWS Lambda.
https://aws.amazon.com/about-aws/whats-new/2016/02/access-resources-within-a-vpc-using-aws-lambda/
As of Feb 2016, AWS allows using lambda functions to connect to Elasticache. Refer to Access Resources within a VPC using AWS Lambda. Here is a link how it works - Tutorial: Configuring a Lambda Function to Access Amazon ElastiCache in an Amazon VPC
Setting up an HTTP Proxy or iptables wouldn't work for the following reasons:
Redis calls are not HTTP and will not be handled by HTTP proxies. iptables (or any port forwarding for that matter) will either won't accept a domain name as destination or is highly inefficient due to DNS resolution required every time.
The best and convenient method is to install twemproxy in an EC2 machine and route your requests through it. As a bonus, you suddenly have deployed a fantastic sharding strategy as well.
I have tried connecting lambda to memcached elasticache and it works fine. Redis should also be doable.
Couple of things to keep in mind:
Lambda and Elasticache has to be in the same VPC.
When lambda is run in VPC, it won't have access to internet (so access to public APIs won't work). NATGateway is required for this.
I was experiencing the same issue. I did not find a direct solution but instead used a Lambda function to connect to an EC2 server using socket.io which was pretty easy and emit an event to that EC2 server.
When the EC2 server received the event it performed the necessary Redis task ( database cleanup after image thumbnail generation ).
Hope this helps! If anyone finds out how to connect to ElastiCache from Lambda directly I'd still love to know!