I have a VPC containing 2 Lambda functions A & B:
A is on 2 public subnets
B is on 2 private subnets with a RDS Database instance
The VPC itself has Internet access via the NAT instance.
I need a 3rd party API to communicate with B, but B is in a private subnet. Now I was wondering whether API Gateway solves this problem or whether it requires more work.
Thanks in advance
If you want the 3rd-party to invoke the Lambda function and pass data to it, then AWS API Gateway is correct for the task. That is exactly what API Gateway is designed to do.
See: Using AWS Lambda with Amazon API Gateway - AWS Lambda
Alternatively, you could use an Elastic Load Balancer and have it invoke the Lambda function.
From: Using AWS Lambda with an Application Load Balancer - AWS Lambda
You also mention that you have an AWS Lambda "on 2 public subnets". AWS Lambda functions should not be configured to connect to public subnets. They should either be configured to use "No VPC" (in which case they can directly access the Internet), or they should be connected to private subnets (and can use a NAT Gateway or NAT Instance to access the Internet if required).
See: Configuring a Lambda function to access resources in a VPC - AWS Lambda:
Connect your function to private subnets to access private resources. If your function needs internet access, use network address translation (NAT). Connecting a function to a public subnet doesn't give it internet access or a public IP address.
Related
I've followed the tutorial here to create a VPC with public and private subnets.
Then I set up an AWS lambda function inside the public subnet to test if it could connect to the outside internet.
Here's my lambda function written in python3
import requests
def lambda_handler(event, context):
r = requests.get('http://www.google.com')
print(r)
The function above failed to fetch the content of http://www.google.com when I set it inside the public subnet in a VPC.
Here's the error message:
"errorMessage": "HTTPConnectionPool(host='www.google.com', port=80):
Max retries exceeded with url: / (Caused by
NewConnectionError(': Failed to establish a new connection: [Errno 110]
Connection timed out',))", "errorType": "ConnectionError",
I don't understand why.
The route table of the public subnet looks like this:
The GET request to http://www.google.com should match igw-XXXXXXXXX target. Why can't the internet-gateway(igw) deliver the request to http://www.google.com and get back the website content?
This article says that I must set the lambda function inside the private subnet in order to have internet access.
If your Lambda function needs to access private VPC resources (for
example, an Amazon RDS DB instance or Amazon EC2 instance), you must
associate the function with a VPC. If your function also requires
internet access (for example, to reach a public AWS service endpoint),
your function must use a NAT gateway or instance.
But it doesn't explain why I can't set the lambda function inside the public subnet.
Lambda functions connected to a VPC public subnet cannot typically access the internet.
To access the internet from a public subnet you need a public IP or you need to route via a NAT that itself has a public IP. You also need an Internet Gateway (IGW). However:
Lambda functions do not, and cannot, have public IP addresses, and
the default route target in a VPC public subnet is the IGW, not a NAT
So, because the Lambda function only has a private IP and its traffic is routed to the IGW rather than to a NAT, all packets to the internet from the Lambda function will be dropped at the IGW.
Should I Configure my Lambda Function for VPC Access?
If your Lambda function does not need to reach private resources inside your VPC (e.g. an RDS database or Elasticsearch cluster) then do not configure the Lambda function to connect to the VPC.
If your Lambda function does need to reach private resources inside your VPC, then configure the Lambda function to connect to private subnets (and only private subnets).
NAT or Not?
If the Lambda function only needs access to resources in the VPC (e.g. an RDS database in a private subnet) then you don't need to route through NAT.
If the Lambda function only needs access to resources in the VPC and access to AWS services that are all available via private VPC Endpoint then you don't need to route through NAT. Use VPC Endpoints.
If your Lambda function needs to reach endpoints on the internet then ensure a default route from the Lambda function's private subnets to a NAT instance or NAT Gateway in a public subnet. And configure an IGW, if needed, without which internet access is not possible.
Be aware that NAT gateway charges per hour and per GB processed so it's worth understanding how to reduce data transfer costs for NAT gateway.
Best Practices
When configuring Lambda functions for VPC access, it is an HA best practice to configure multiple (private) subnets across different Availability Zones (AZs).
Intermittent Connectivity
Be sure that all the subnets you configure for your Lambda function are private subnets. It is a common mistake to configure, for example, 1 private subnet and 1 public subnet. This will result in your Lambda function working OK sometimes and failing at other times without any obvious cause.
For example, the Lambda function may succeed 5 times in a row, and then fail with a timeout (being unable to access some internet resource or AWS service). This happens because the first launch was in a private subnet, launches 2-5 reused the same Lambda function execution environment in the same private subnet (the so-called "warm start"), and then launch 6 was a "cold start" where the AWS Lambda service deployed the Lambda function in a public subnet where the Lambda function has no route to the internet.
You can make a lambda function access the public internet from within your VPC, you just need to make sure you really need it.
For accessing resources external to AWS such as Google API (like OP's example) you do need a Public IP. For other cases like RDS or S3 you don't need Public IP, you can use a VPC Endpoint, so communication between your Lambda and the desired AWS Service doesn't leave AWS network.
By default some AWS Services are indeed reached via public internet, but it doesn't have to be.
[EDIT]
Someone was concerned about scalability in the comments, but they missed this from AWS Docs:
"Multiple Lambda functions can share a network interface, if the functions share the same subnet and security group"
Also, you must have a Public IP for reaching Public Internet, whether you're using Lambda, EC2, ECS, even if you use a NAT Gateway it needs an Elastic Public IP if you want to reach the public internet through it.
Solution
To do that, you need to assign Elastic Public IPs to the Network Interfaces for each subnet linked to your lambda. First let's figure which subnets and security groups are linked to your lambda:
Next, go to EC2 Service, find the Public IPs menu under Network & Security. Allocate one IP for each subnet (in the example above there are two subnets).
Go to Network Interfaces menu, find the network interfaces attached to your lambda (same subnet and security group).
Associate the Public IPs in the actions menu for each one:
That's it, now your Lambda can reach out to public internet.
As the title suggests, I placed my Lambda function in a private subnet and now It cannot access or timeout when scanning the DB. Prior to this, it could access and scan the DB. What should I do?
Your DynamoDB resources are not in your VPC. Since you've configured your Lambda functions to connect to your VPC, you need to setup a NAT Gateway or NAT Instance to allow your private resources to access the internet. As the docs state:
AWS Lambda uses the VPC information you provide to set up ENIs that
allow your Lambda function to access VPC resources. Each ENI is
assigned a private IP address from the IP address range within the
Subnets you specify, but is not assigned any public IP addresses.
Therefore, if your Lambda function requires Internet access (for
example, to access AWS services that don't have VPC endpoints), you
can configure a NAT instance inside your VPC or you can use the Amazon
VPC NAT gateway. You cannot use an Internet gateway attached to your
VPC, since that requires the ENI to have public IP addresses.
AWS Lambda Doc
Validate the following:
The route table for the Lambda has a NAT Gateway for internet traffic that resides in public subnet.
DynamoDB Gateway endpoint exists? Check its policy to ensure that it is not limited to specific sources
Outbound access is allowed via the security group and NACL
I am working on a project where my main lambda function is in a VPC in private subnet and some sister lambda functions in a different VPC which are in their own private subnets. How can I go about calling these sister lambdas's from the main lambda across VPC without giving internet access to each of them via a NAT gateway linking to a public subnet which has an internet gateway attached to it.
Other AWS services that my main lambda invokes are:
1. S3
2. Dynamodb
3. Autoscaling
4. ECS
5. RDS
This can be done, but there are some complex steps involved.
First of all, when you use aws-sdk, the calls are made through the internet. To avoid this situation and access the services within the AWS network, The AWS has introduced some private VPC endpoints. I have only used S3 and API gateway private endpoints to date. But there is more type of VPC endpoints.
This is how I would do today,
setup a private API gateway API to invoke lambda - The private API's are only accessible through a private VPC endpoint for API gateway.
create a private VPC endpoint for API gateway.
setup VPC peering between the VPCs
(from the sister lambda on other VPC) invoke the API through the VPC endpoints public DNS URL
The drawback of adding an API in front of the lambda is, the API has a hard timeout of 29 seconds.
hope this helps.
I've followed the tutorial here to create a VPC with public and private subnets.
Then I set up an AWS lambda function inside the public subnet to test if it could connect to the outside internet.
Here's my lambda function written in python3
import requests
def lambda_handler(event, context):
r = requests.get('http://www.google.com')
print(r)
The function above failed to fetch the content of http://www.google.com when I set it inside the public subnet in a VPC.
Here's the error message:
"errorMessage": "HTTPConnectionPool(host='www.google.com', port=80):
Max retries exceeded with url: / (Caused by
NewConnectionError(': Failed to establish a new connection: [Errno 110]
Connection timed out',))", "errorType": "ConnectionError",
I don't understand why.
The route table of the public subnet looks like this:
The GET request to http://www.google.com should match igw-XXXXXXXXX target. Why can't the internet-gateway(igw) deliver the request to http://www.google.com and get back the website content?
This article says that I must set the lambda function inside the private subnet in order to have internet access.
If your Lambda function needs to access private VPC resources (for
example, an Amazon RDS DB instance or Amazon EC2 instance), you must
associate the function with a VPC. If your function also requires
internet access (for example, to reach a public AWS service endpoint),
your function must use a NAT gateway or instance.
But it doesn't explain why I can't set the lambda function inside the public subnet.
Lambda functions connected to a VPC public subnet cannot typically access the internet.
To access the internet from a public subnet you need a public IP or you need to route via a NAT that itself has a public IP. You also need an Internet Gateway (IGW). However:
Lambda functions do not, and cannot, have public IP addresses, and
the default route target in a VPC public subnet is the IGW, not a NAT
So, because the Lambda function only has a private IP and its traffic is routed to the IGW rather than to a NAT, all packets to the internet from the Lambda function will be dropped at the IGW.
Should I Configure my Lambda Function for VPC Access?
If your Lambda function does not need to reach private resources inside your VPC (e.g. an RDS database or Elasticsearch cluster) then do not configure the Lambda function to connect to the VPC.
If your Lambda function does need to reach private resources inside your VPC, then configure the Lambda function to connect to private subnets (and only private subnets).
NAT or Not?
If the Lambda function only needs access to resources in the VPC (e.g. an RDS database in a private subnet) then you don't need to route through NAT.
If the Lambda function only needs access to resources in the VPC and access to AWS services that are all available via private VPC Endpoint then you don't need to route through NAT. Use VPC Endpoints.
If your Lambda function needs to reach endpoints on the internet then ensure a default route from the Lambda function's private subnets to a NAT instance or NAT Gateway in a public subnet. And configure an IGW, if needed, without which internet access is not possible.
Be aware that NAT gateway charges per hour and per GB processed so it's worth understanding how to reduce data transfer costs for NAT gateway.
Best Practices
When configuring Lambda functions for VPC access, it is an HA best practice to configure multiple (private) subnets across different Availability Zones (AZs).
Intermittent Connectivity
Be sure that all the subnets you configure for your Lambda function are private subnets. It is a common mistake to configure, for example, 1 private subnet and 1 public subnet. This will result in your Lambda function working OK sometimes and failing at other times without any obvious cause.
For example, the Lambda function may succeed 5 times in a row, and then fail with a timeout (being unable to access some internet resource or AWS service). This happens because the first launch was in a private subnet, launches 2-5 reused the same Lambda function execution environment in the same private subnet (the so-called "warm start"), and then launch 6 was a "cold start" where the AWS Lambda service deployed the Lambda function in a public subnet where the Lambda function has no route to the internet.
You can make a lambda function access the public internet from within your VPC. Solution A is the actual answer, Solution B is a more elegant alternative solution.
Solution A - Lambda in VPC + Public IP associated with ENI
For accessing resources external to AWS such as Google API (like OP's example) you do need a Public IP. For other cases like RDS or S3 you don't need a Public IP, you can use a VPC Endpoint, so communication between your Lambda and the desired AWS Service doesn't leave AWS network.
By default some AWS Services are indeed reached via public internet, but it doesn't have to be.
Now if you want an actual external resource (e.g. google), you need to assign Elastic Public IPs to the Network Interfaces for each subnet linked to your lambda. First let's figure which subnets and security groups are linked to your lambda:
Next, go to EC2 Service, find the Public IPs menu under Network & Security. Allocate one IP for each subnet (in the example above there are two subnets).
Go to Network Interfaces menu, find the network interfaces attached to your lambda (same subnet and security group).
Associate the Public IPs in the actions menu for each one:
That's it, now your Lambda can reach out to public internet.
[EDIT]
Someone was concerned about Solution's A scalability issues saying each lambda instance has a new network interface but they missed this from AWS Docs:
"Multiple Lambda functions can share a network interface, if the functions share the same subnet and security group"
So whatever scalability issues you may face has nothing to do with this solution and how Lambda uses ENI, you'd face the same issues using EC2, ECS, EKS, not just Lambda.
Solution B - Decompose into multiple Lambdas
Requiring access both to external resources and VPC resources would seem like too much responsibility for a single function. You may want to rethink your design and decompose your single lambda function into at least two lambda functions:
Lambda A goes to external resources (e.g. Google API), fetches whatever data you need, add to SQS. No need to attach to VPC, no need to manually associate Elastic Public IP to ENI.
Lambda B processes the message from SQS, stores results to a storage (db, s3, efs, another queue, etc). This one lives within your VPC, and don't need external access.
This way seems more scalable, more secure, each individual lambda is less complex and more maintainable, the architecture looks better overall.
Of course life is not always rainbows and butterflies, so Solution A is good and scalable enough, but improving the architecture is even better.
I am trying to have an architecture with:
Route53 <-> API gateway <-> Lambda <-> RDS and DynamoDB.
I am confused about some networking aspects here!
From most of the documentation, what I understand is that Lambda is by default launched in default VPC and can access internet from there but no resources inside a "VPC". And this 2nd VPC (in quotes) refers to non-default VPCs in most discussions. But what is not clear is what if I placed the Lambda and RDS both in default VPC, lambda in a public subnet with --vpc-config info and RDS in a private subnet, will my Lambda have the internet connection?
Even when everything is in default subnet, should I put my lambda function in to a private subnet with Internet access through an Amazon VPC NAT gateway?
I know it is a theoretical question - documents are confusing me by not explicitly mentioning what cannot be done!
From most of the documentation, what I understand is that Lambda is by
default launched in default VPC and can access internet from there but
no resources inside a "VPC".
That is incorrect. By default Lambda is not launched in a VPC at all. Or if it is in a VPC it is in one that you cannot see because it doesn't exist in your AWS account.
what if I placed the Lambda and RDS both in default VPC, lambda in a
public subnet with --vpc-config info and RDS in a private subnet, will
my Lambda have the internet connection?
No, your Lambda function will not have internet access, even in a public subnet. This is because it is never assigned a public IP address. Once you place a Lambda function inside a VPC you have to have a NAT gateway in order to for the Lambda function to access anything outside the VPC.
Even when everything is in default subnet, should I put my lambda
function in to a private subnet with Internet access through an Amazon
VPC NAT gateway?
Yes, that is the correct way to provide a Lambda function with access to both a VPC and resources that exist outside the VPC.
Also note that DynamoDB (and the AWS API) does not run in your VPC. So if you place a Lambda function inside your VPC that needs to access DynamoDB, or anything else that is accessed via the AWS API, you will have to add a NAT gateway to the VPC.
Note that the "Default VPC" is the term for a the VPC that is setup for you when you first create your AWS account. You can see this VPC in your account in the VPC service console. Aside from it being created for you with default settings, you should just think of this as another VPC in your account. The Default VPC is not used by Lambda when you don't specify a VPC, and it is not used by other services like DynamoDB that exist outside your VPC network.