Access AWS S3 from Lambda within VPC - amazon-web-services

Overall, I'm pretty confused by using AWS Lambda within a VPC. The problem is Lambda is timing out while trying to access an S3 bucket. The solution seems to be a VPC Endpoint.
I've added the Lambda function to a VPC so it can access an RDS hosted database (not shown in the code below, but functional). However, now I can't access S3 and any attempt to do so times out.
I tried creating a VPC S3 Endpoint, but nothing has changed.
VPC Configuration
I'm using a simple VPC created by default whenever I first made an EC2 instance. It has four subnets, all created by default.
VPC Route Table
_Destination - Target - Status - Propagated_
172.31.0.0/16 - local - Active - No
pl-63a5400a (com.amazonaws.us-east-1.s3) - vpce-b44c8bdd - Active - No
0.0.0.0/0 - igw-325e6a56 - Active - No
Simple S3 Download Lambda:
import boto3
import pymysql
from StringIO import StringIO
def lambda_handler(event, context):
s3Obj = StringIO()
return boto3.resource('s3').Bucket('marineharvester').download_fileobj('Holding - Midsummer/sample', s3Obj)

There is another solution related to VPC endpoints.
On AWS Console, choose VPC service and then Endpoints. Create a new endpoint, associate it to s3 service
and then select the VPC and Route Table.
Then select access level (full or custom) and it will work.

With boto3, the S3 urls are virtual by default, which then require internet access to be resolved to region specific urls. This causes the hanging of the Lambda function until timeout.
To resolve this requires use of a Config object when creating the client, which tells boto3 to create path based S3 urls instead:
import boto3
import botocore
client = boto3.client('s3', 'ap-southeast-2', config=botocore.config.Config(s3={'addressing_style':'path'}))
Note that the region in the call must be the region to which you are deploying the lambda and VPC Endpoint.
Then you will be able to use the pl-xxxxxx prefix list for the VPC Endpoint within the Lambda's security group, and still access S3.
Here is a working CloudFormation script that demonstrates this. It creates an S3 bucket, a lambda (that puts records into the bucket) associated to a VPC containing only private subnets and the VPC Endpoint, and necessary IAM roles.

There's another issue having to do with subnets and routes that is not addressed in the other answers, so I am creating a separate answer with the proviso that all the above answers apply. You have to get them all right for the lambda function to access S3.
When you create a new AWS account which I did last fall, there is no route table automatically associated with your default VPC (see Route Tables -> Subnet Associations in the Console).
So if you follow the instructions to create an Endpoint and create a route for that Endpoint, no route gets added, because there's no subnet to put it on. And as usual with AWS you don't get an error message...
What you should do is create a subnet for your lambda function, associate that subnet with the route table and the lambda function, and then rerun the Endpoint instructions and you will, if successful, find a route table that has three entries like this:
Destination Target
10.0.0.0/16 Local
0.0.0.0/0 igw-1a2b3c4d
pl-1a2b3c4d vpce-11bb22cc
If you only have two entries (no 'pl-xxxxx' entry), then you have not yet succeeded.
In the end I guess it should be no surprise that a lambda function needs a subnet to live on, like any other entity in a network. And it's probably advisable that it not live on the same subnet as your EC2 instances because lambda might need different routes or security permissions. Note that the GUI in lambda really wants you to have two subnets in two different AZs which is also a good idea.

The cause of my issue had been not properly configuring the Outbound Rules of my security group. Specifically, I needed to add Custom Protocol Outbound Rule with a destination of pl-XXXXXXXX (the S3 service. The actual value was provided by the AWS Console).

I just wanted to add one other answer amongst the others, which might affect those running functions with slow cold start times.
I'd followed all the instructions about setting up a gateway for S3, but still it didn't work. I created a test Node.js function which simply listed the buckets - I verified that this didn't work without the S3 gateway, but did once the gateway was established. So I knew that part of things was working fine.
As I was debugging this I was changing the timeout of the function to ensure the function was updated and I was using the latest version of the code when invoking and testing.
I'd reduced the timeout to 10s, only it turned out my function needed more like 15s on cold boot. Once I'd increased the timeout again, it worked.

Adding to the answer from Luis RM, this is a construct that can be used in CDK:
const vpcEndpoint = new ec2.GatewayVpcEndpoint(this, 'S3GatewayVpcEndpoint', {
vpc: myVpc,
service: { name: 'com.amazonaws.us-west-1.s3' },
})
const rolePolicies = [
{
Sid: 'AccessToSpecificBucket',
Effect: 'Allow',
Action: [
's3:ListBucket',
's3:GetObject',
's3:PutObject',
's3:DeleteObject',
's3:GetObjectVersion',
],
Resource: ['arn:aws:s3:::myBucket', arn:aws:s3:::myBucket/*'],
Principal: '*',
},
]
rolePolicies.forEach((policy) => {
vpcEndpoint.addToPolicy(iam.PolicyStatement.fromJson(policy))
})

To access S3 from within the Lambda function which is within a VPC, you can use a Natgateway (a much expensive solution in comparison to the VPC endpoint ). If you have two private subnets within the VPC, (where subnets are having a route to a NAT gateway ) and associate them with the Lambda, it can access the S3 bucket like any Lambda which are outside the VPC.
Gotchas
If you associate a public subnet with the Lambda expect it to work, it will not.
Make sure your security group is in place to accept ingress.
This approach will make any service available in the internet accessible to the Lambda function . For detailed steps you can follow this blog https://blog.theodo.com/2020/01/internet-access-to-lambda-in-vpc/

There are 3 ways to access S3 from within private subnet in a VPC
NAT Gateway
Gateway Endpoint
Interface Endpoint
I assume that you don't want to use NAT Gateway.
If you're using Gateway endpoint - you don't need to change the endpoint that you connect to S3. But if you're using interface endpoint, you need to update the s3 endpoint.
There is a detailed step-by-step guide on doing the same here - https://www.cloudtechsimplified.com/aws-lambda-vpc-s3/

Related

AWS Lambda can't retrieve file from an Amazon S3 Bucket in same network

I have a AWS project that contains a S3 bucket, RDS database and Lambda functions.
I want Lambda to have access to both the S3 bucket and the RDS database.
The Lambda functions connects to the RDS database correctly but it times out when trying to retrieve an object from the S3 bucket:
Event needs-retry.s3.GetObject: calling handler <bound method S3RegionRedirectorv2.redirect_from_error of <botocore.utils.S3RegionRedirectorv2 object at 0x7f473a4ae910>>
...
(some more error lines)
...
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://{bucket name}.s3.eu-west-3.amazonaws.com/{file name}.tar.gz"
So I understand that the reason would be that Lambda doesn't have internet access and therefor my options are:
VPC endpoint (privatelink): https://aws.amazon.com/privatelink
NAT gateway for Lambda
But both go over the cloud (in same region), which doesn't make any sense as they are both in the same project.
It's just a redundant cost for such a detail and there must be a better solution right?
Maybe it helps you to think of the S3 bucket "in the same project" as having permission to use an object system that resides in a different network outside your own. Your lambda is in VPC but S3 objects are not in your VPC. You access them using either public end-points (over the internet) or privately by establishing S3 Gateway endpoint or VPC Interface Endpoint. Neither uses public internet.
As long as you are staying in the same region, S3 gateway endpoint actually does not cost you money but if you need to cross regions, you will need to use VPC Interface endpoint. The differences are documented here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html
If you are trying to avoid costs, S3 gateway might work for you, however, you will need to update your route tables that's associated with the gateway. The process is documented here: https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html

Lambda security group to S3 and Secrets Manager

I'm new to Terraform (TF) and AWS. I've created TF that creates an RDS cluster, VPC and security groups, S3 bucket, secrets in secrets manager (SM), as well as a lambda that access all of the above. I've attached the RDS VPC and security group to the lambda. So the code in the lambda can successfully access the RDS. My problem is that I need a security group that allow the lambda code to read from secrets manager to get RDS user accounts and S3 to get sql scripts to execute on the RDS. So, a security group with outbound to S3 and secrets manager.
How do i get terraform to calculate (data) the details to the SM and S3. Then use this info to create the security group to allow the lambda code to access SM and S3.
Currently I'm forcing my way with "All to All on 0.0....", this will not be allowed in the production environment.
So, a security group with outbound to S3 and secrets manager.
The easiest way would be to use S3 VPC interface endpoint, not S3 gateway. Thus if you have two endpoints for S3 and SM, both will be associated with SG that you must have created in your code, or they will use default one.
So to simply limit access of your lambda to the S3 and SM, you have just reference the interface's SGs in your lambda's SG.

AWS Lambda timesout with boto3.resource('s3')

I have created a lambda to react when files are uploaded to a bucket.
One of my first actions is to retrieve the version_id of the file using boto3.
Below is a function which get the version_id based on bucket and key.
The s3_resource.Object call seems to work fine. But if I un-comment the line which prints the actual version_id, then my lambda times out (timeout is set to 120s).
The print of the object itself seems to work fine, it's only if I try to print the version_id that it times out. Would this have something do the with the NAT gateway?
def get_file_version_id(bucket, key):
s3_resource = boto3.resource('s3')
file_obj = s3_resource.Object(bucket,key)
print(f'file_obj: {file_obj}')
#print(f'version_id: {file_obj.version_id}')
#return file_obj.version_id
return "Some Return Value"
You are using the high-level Resource API calls rather than the low-level Client API calls.
Resources, such as s3.Bucket have attributes and these are lazy-loaded properties. So, when you create an s3.Object, that's a purely local thing. But when you try to access one of its properties, e.g. the content of an existing object or its version ID, the boto3 SDK will then make an actual API call to the S3 service.
The reason that your code times out is most likely that you do not have a network path to the S3 service. That probably means that you are running your Lambda function in a VPC and you have either deployed it in a public subnet or in a private subnet without also giving that subnet a default route to the internet via a NAT and Internet Gateway or an S3 VPC Endpoint.
So, either deploy your Lambda function outside of VPC. Or, if you need it to be in a VPC, then deploy it into a private subnet of your VPC (not a public subnet) and then ensure you have IGW and NAT in your public subnet and a default route from the Lambda's private subnet to the NAT. Or alternatively go the private subnet and S3 VPC Endpoint route.
PS check the event parameter passed into your Lambda function handler, just in case it actually provides you with the version ID. I'm not sure if it does or not, but it would be good to check.

AWS Lambda can't reach resources created from MobileHub

I am having an issue accessing resources created in MobileHub from Lambda and that does not make sense for me at all.. I have two questions (maybe it is the same question..):
Why lambda can't access all resources created by MobileHub when it has fullAccess permissions to those specific resources? I mean, if I create those resources separately I can access them but not created ones from MobileHub..
Is there a way to grant access to these resources or am I missing something?
Update
The issue was VPC. Basically when I enabled VPC on lambdas to reach rds which have no public access I couldn't reach any other resources, when I disabled it - RDS was unreachable. The question is how to combine vpc with role policies?
You can find the resources associated with your project using the left-side navigation in the Mobile Hub console and select "Resources." If you want to enable your AWS Lambda functions to be able to make use of any AWS resources, then you'll need to add an appropriate IAM Policy to the Lambda Execute IAM Role. You can find this role in your project on the "Resources" page under "AWS Identity and Access Management Roles." It is the role that has "lambdaexecutionrole" in the name. Select this role then attach whatever policies you like in the IAM (Identity and Access Management) console.
For more information on how to attach roles to polices, see:
http://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_manage_modify.html
And, if you have further problems, you can get help from the AWS community in the forums, here:
https://forums.aws.amazon.com/forum.jspa?forumID=88
**Update - WRT VPC Question**
This question should really go to an expert on the AWS Lambda team. You can reach them in the AWS Forums (link above). However, I'll take a shot at answering (AWS Lambda experts feel free to chime in if I'm wrong here). When you set the VPC on the Lambda function, I expect that any network traffic coming from your Lambda function will have the same routing and domain name resolution behavior as anything else in your VPC. So, if your VPC has firewall rules which prevent traffic from the VPC to, for example, DynamoDB, then you won't be able to reach it. If that is the case, then you would need to update those rules in your VPC's security group(s) to open up out-going traffic. Here's a blurb from a relevant document.
From https://aws.amazon.com/vpc/details/:
*AWS resources such as Elastic Load Balancing, Amazon ElastiCache, Amazon RDS, and Amazon Redshift are provisioned with IP addresses within your VPC. Other AWS resources such as Amazon S3 and Amazon DynamoDB are accessible via your VPC’s Internet Gateway, NAT gateways, VPC Endpoints, or Virtual Private Gateway.*
This doc seems to explain how to configure the gateway approach:
http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-endpoints.html

can AWS Lambda connect to RDS mySQL database and update the database?

I am trying to connect AWS Lambda function to RDS mysql database.
I just wanted to update the database from my lambda function. Is it possible to access RDS by specifiying IAM Role and access Policy?.
I can connect to mysql databse using mysql client.but when i try on lambda i can't do that. here is my code.
console.log('Loading function');
var doc = require('dynamodb-doc');
var dynamo = new doc.DynamoDB();
var mysql = require('mysql');
exports.handler = function(event, context) {
//console.log('Received event:', JSON.stringify(event, null, 2));
var operation = event.operation;
delete event.operation;
switch (operation) {
case 'create':
var conn = mysql.createConnection({
host: 'lamdatest.********.rds.amazonaws.com', // RDS endpoint
user: 'user', // MySQL username
password: 'password', // MySQL password
database: 'rdslamda'
});
conn.connect();
console.log("connecting...");
conn.query('INSERT INTO login (name,password) VALUES("use6","password6")', function(err, info) {
console.log("insert: " + info.msg + " /err: " + err);
});
console.log("insert values in to database");
break;
case 'read':
dynamo.getItem(event, context.done());
break;
default:
context.fail(new Error('Unrecognized operation "' + operation + '"'));
}
context.succeed();
};
Yes. You can access a MySql RDS database from AWS Lambda.
You can use node-mysql library.
Link: https://github.com/felixge/node-mysql/
However, there is a big caveat that goes with it.
AWS Lambda does not (currently) have access to private subnets inside a VPC. So in order for AWS Lambda to access your RDS database, it must be publicly accessible, which could be a security risk for you.
Update (2015-10-30): AWS Lambda announced upcoming VPC support (as of re:Invent 2015), so this won't be an issue for much longer.
Update (2015-11-17): AWS Lambda still does not have VPC support.
Update (2016-02-11): AWS Lambda can now access VPC resources:
https://aws.amazon.com/blogs/aws/new-access-resources-in-a-vpc-from-your-lambda-functions/
To achieve this functionality, your Lambda function will actually execute inside your VPC in a subnet. Some caveats come with this functionality:
The VPC subnet needs enough free IP addresses to handle Lambda's scaling
If your Lambda function needs internet access, then it's designated VPC subnet will need an Internet Gateway or NAT
try this tutorial:
http://docs.aws.amazon.com/lambda/latest/dg/vpc-rds.html
In this tutorial, you do the following:
Launch an Amazon RDS MySQL database engine instance in your default Amazon VPC.
In the MySQL instance, you create a database (ExampleDB) with a sample table (Employee) in it.
Create a Lambda function to access the ExampleDB database, create a table (Employee), add a few records, and retrieve the records from the table.
Invoke the Lambda function manually and verify the query results.
Since Lambda uses Node.js, Java and Python as a backend programming/scripting language, you can definitely use it to connect to RDS. (Link)
Finally, This is the documentation on specifying IAM Roles when connecting to RDS. (See image below):
I just wanted to update the database from my lambda function. Is it possible to access RDS by specifiying IAM Role and access Policy?.
No you cannot. You need to provide DB url/username/password to connect. You may need to run Lambda in same VPC if it is in private subnet. See my pointers below.
I can connect to mysql databse using mysql client.but when i try on lambda i can't do that.
This is strict No , No! Your RDS should not be accessible from Internet unless you really need it. Try to run it in private subnet and configure other AWS services accordingly.
Two cents from my end if you are getting timeouts accessing resourced from Lambda-
By default Lambda has internet access and can access online resources.
Lambda cannot access services rurnning in private subnet of your VPC.
To connect to services in private subnet you need to run the lambda is private subnet. For this you need to go to Network section and configure your VPC, subnets and security group.
However note that when you do this you will loose Internet access. If you still need Internet access you will have to spin up a NAT gateway or NAT instance in public subnet and configure route from private subnet to this NAT.
I faced this when I was trying to connect to RDS in private subnet from my lambda. Since I used KMS to encrypt some environment variables and decryption part requires Internet access I had to use a NAT gateway.
More details - http://docs.aws.amazon.com/lambda/latest/dg/vpc.html#vpc-internet
How to connect to postgres RDS from AWS Lambda
PS: Above links go to my personal blog that has additional relevant information.