I have a lambda function accessing a S3 bucket using aws-sdk
There are a high number of operations(requests) to the S3 bucket, which is increasing considerably the cost to use lambda
I was hoping that the requests use the s3:// protocol but there are going over the internet
I understand that one solution could be:
Attach the Lambda to a VPC
Create a VPC endpoint to S3
Update the route tables of the VPC
Is there a simpler way to do so?
An alternative could be creating an API Gateway, and creating lambda proxy method integration following the AWS Guide or Tutorial.
You can then configure your apigateway to act as your external facing integration over the internet and your lambda / s3 stays within AWS.
The traffic won't go over the internet and incur additional data transfer cost as long as the non-VPC lambda function is executing in the same region as the S3 bucket. So VPC is not needed in this case.
https://aws.amazon.com/s3/pricing/
You pay for all bandwidth into and out of Amazon S3, except for the following:
• Data transferred in from the internet.
• Data transferred out to an Amazon Elastic Compute Cloud (Amazon EC2) instance, when the instance is in the same AWS Region as the S3 bucket.
• Data transferred out to Amazon CloudFront (CloudFront).
You can think of lambda as ec2. So the data transfer is free but be careful you still need to pay for api request.
Related
For e.g.: -
Consider a scenario where I have a back-end service which takes dynamic data from RDS and static data (Audio/Video/pdf) from S3 Bucket.
Back-End Service is deployed over an EC2 instance which internally uses AWS SDK to fetch static data from S3 Bucket. Below is the flow:
User Request Data ---> AWS Route 53 ---? ALB ---> Target EC2 Instance ---> Fetch Data from S3 Bucket.
Based on the above scenario if a user request is always going to route to EC2 Instance and EC2 Instance and S3 are in the same region then is there any need of configuring CloudFront in the flow?
Yes I strongly recommend to use CLoudfront with s3 for your static dat.
In fact this is one of the primary use case. This will also give you advantage not only in terms of latency and cost but also in terms of security because you can choose who can access content from your S3 using OAI ( origin access identity )
If you want to know more and understand how cloudfront can help you here is a dedicated blog from aws on this use case -> https://aws.amazon.com/blogs/networking-and-content-delivery/amazon-s3-amazon-cloudfront-a-match-made-in-the-cloud/
I have a AWS project that contains a S3 bucket, RDS database and Lambda functions.
I want Lambda to have access to both the S3 bucket and the RDS database.
The Lambda functions connects to the RDS database correctly but it times out when trying to retrieve an object from the S3 bucket:
Event needs-retry.s3.GetObject: calling handler <bound method S3RegionRedirectorv2.redirect_from_error of <botocore.utils.S3RegionRedirectorv2 object at 0x7f473a4ae910>>
...
(some more error lines)
...
botocore.exceptions.ConnectTimeoutError: Connect timeout on endpoint URL: "https://{bucket name}.s3.eu-west-3.amazonaws.com/{file name}.tar.gz"
So I understand that the reason would be that Lambda doesn't have internet access and therefor my options are:
VPC endpoint (privatelink): https://aws.amazon.com/privatelink
NAT gateway for Lambda
But both go over the cloud (in same region), which doesn't make any sense as they are both in the same project.
It's just a redundant cost for such a detail and there must be a better solution right?
Maybe it helps you to think of the S3 bucket "in the same project" as having permission to use an object system that resides in a different network outside your own. Your lambda is in VPC but S3 objects are not in your VPC. You access them using either public end-points (over the internet) or privately by establishing S3 Gateway endpoint or VPC Interface Endpoint. Neither uses public internet.
As long as you are staying in the same region, S3 gateway endpoint actually does not cost you money but if you need to cross regions, you will need to use VPC Interface endpoint. The differences are documented here: https://docs.aws.amazon.com/AmazonS3/latest/userguide/privatelink-interface-endpoints.html
If you are trying to avoid costs, S3 gateway might work for you, however, you will need to update your route tables that's associated with the gateway. The process is documented here: https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html
I have a task where I had to check whether if it is possible to serve a secure website where the
content is served out from S3 and dynamic data is served out from RDS.
Is it possible to do this job, or do I need EC2 instances as well?
Thanks for helping me,
Yes, this is possible - static assets (html/js/css/images) all stored on s3, cloudfront distribution pointing to your s3 location, an api gateway layer to act as the endpoints for your api calls - those api endpoints call aws lambda functions, and then some custom aws lambda code to perform the actual rds queries - and authentication done by aws cognito.
All this can be done without ec2.
I use Apache Spark and Redshift in VPС and also use AWS S3 for source data and temp data for Redshift COPY.
Right now I suspect that performance of read/write from/to AWS S3 is not good enough and based on the suggestion in the following discussion https://github.com/databricks/spark-redshift/issues/318 I have created S3 endpoint within the VPC. Right now I can't see any performance difference before and after S3 endpoint creation when I'm loading data from S3.
In Apache Spark I read data in the following way:
spark.read.csv("s3://example-dev-data/dictionary/file.csv")
Do I need to add/configure some extra logic/configuration on AWS EMR Apache Spark in order to proper use of AWS S3 endpoint?
The S3 VPC Endpoint is a Gateway Endpoint so you have to put a new entry in the routing table of your subnets where you start EMR clusters that route the traffic to the endpoint.
Suppose I create a vpc and a vpc-endpoint in region1.
Can I communicate to an s3-bucket-in-region2 using this vpc-endpoint, i.e. without using the internet?
No, VPC endpoints to not support cross region requests. Your bucket(s) need to be in the same region as the VPC.
Endpoints for Amazon S3
Endpoints currently do not support cross-region requests—ensure that
you create your endpoint in the same region as your bucket. You can
find the location of your bucket by using the Amazon S3 console, or by
using the get-bucket-location command. Use a region-specific Amazon S3
endpoint to access your bucket; for example,
mybucket.s3-us-west-2.amazonaws.com. For more information about
region-specific endpoints for Amazon S3, see Amazon Simple Storage
Service (S3) in Amazon Web Services General Reference.