I have an Ec2 instance in a public subnet and would be uploading data to an s3 bucket.
I understand that while this traffic traverses the internet gateway, it does not leave the AWS network
Reference: https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-access-aws-services.html
Now I am creating a s3 gateway end point (modify the route table to send traffic thru this).
I also try creating a s3 interface end point.
I measure that the time it takes to upload a 250MB file is the same in both cases (gateway endpoint and interface endpoint)
I am unable to understand two points:
If traffic does NOT leave the AWS network even though Internet gateway is used, in this case, is there any security benefit ?
When using privatelink, I understand that the traffic goes thru Hyperplane, which is why I get increased upload speed.
https://www.youtube.com/watch?v=8gc2DgBqo9U&t=2010s
And AWS is charging for the interface point.
However I don't understand why s3 gateway endpoints are free.
Does it not use hyperplane ?
Is it less performant or resilient ?
You have three options for uploading data from EC2 to S3 by using
Internet gateway: in this case, traffic DOES leave your VPC and goes over AWS network. It's less secure and slower.
Interface endpoint: traffic DOES NOT leave your VPC and goes directly to service. It's secure and fast, but it isn't free.
Gateway endpoint: traffic DOES NOT leave your VPC and goes directly to service. It's secure, fast and free, though you are limited to DynamoDB and S3 services.
Both interface endpoint and gateway endpoint use AWS PrivateLink (Hyperplane) technology.
I think the difference in pricing is related to difference in the nature of services deployment:
Interface endpoint is basically a separate ENI in subnet whereas
Gateway endpoint is an attachment on VPC level which is used for accepting S3/DynamoDB traffic.
Interface endpoints as ENIs got an IP address allocated and they are under your (customer) control. Hence, you are consuming AWS network resources.
At the same time, underlying network resources for Gateway endpoints are not exposed to you (customer), hence, you as a customer, do not reserve any AWS resources and there is nothing to be charged for.
Related
I am trying to understand the concept of how VPC endpoints work and I am not sure that I understand the AWS documentation. For example, I have a private S3 bucket and I have an EKS cluster. So if my bucket is private I believe that traffic from the EKS cluster to S3 does not go through the internet, but only through the AWS network. But in a case my s3 bucket was public, then probably I will need to set up the VPC endpoint, so traffic will not leave the AWS. The same logic I would expect with ECR, if it is private you load images to your EKS through AWS network.
So what is the exact case when you need to use VPC endpoint within your AWS account (not from on-prem or another VPC)?
VPC endpoints are typically used with public AWS services (such as S3, DynamoDB, ECR, etc.) when the client applications are hosted inside your VPC and you do not want to route traffic via public Internet, which would otherwise result in a number of hops to reach the AWS service.
Imagine a situation when you have an app running on an EC2 instance, which is deployed to a private subnet of your VPC (i.e. a Pod in your EKS cluster). This app reads/writes data from/to AWS S3. If you do not use a VPC endpoint, your traffic will first reach your NAT gateway, then your VPC's Internet gateway out to the public Internet. Eventually, it will hit AWS S3. The response will travel back via the same route.
Same thing with ECR (i.e. a new instance of your Kubernetes Pod started by the kubelet). It's better (i.e. quicker) to pick the shortest route to download a Docker image from ECR rather than traverse a number of switches/routers. With a VPC endpoint your traffic will first hit the VPC endpoint (without leaving your private subnet) and then reach e.g. ECR directly (traffic does not leave the Amazon network).
As correctly mentioned by #jarmod, one should differentiate between routing (Layer 3 in the OSI model) and authentication/authorization (Layer 7). For example, you can use a VPC endpoint to reach AWS S3, but not be authorized (or even unauthenticated) to e.g. read a file from an S3 bucket.
Hope this clarifies the idea behind using VPC endpoints.
I'm trying to lower egress fees from multiple S3 buckets in one AWS account (several Terabytes per month) to our US datacenter.
I thought of setting up a VPC on our AWS account, and using a Gateway Endpoint for S3 and then having a Direct Connect always active from our datacenter to our AWS VPC (and pay the usual hourly + GB transferred reduced fees).
After reading the documentation it is my understanding that I will not pay any traffic to use the Gateway Endpoint for S3 since traffic never leaves AWS until it reaches our AWS VPC. To transfer it to our datacenter the usual hourly + GB transferred reduced fees for the Direct Connect is then billed.
Is this correct? Will we still be able to initiate a GET from our datacenter applications (through the Direct Connect to this VPC Gateway Endpoint for S3 so that we can pull S3 files from the 3rd party AWS account that endpoint is linked to? (requests would only originate from our datacenter servers to get or sometimes put S3 files).
You will not be able to route to the gateway endpoint via direct connect. However, you can configure a public VIF on your direct connect to route all traffic to the Amazon IP spaces via the direct connect.
From: Which type of virtual interface should I use to connect different resources in AWS?
To connect to AWS resources that are reachable by a public IP address
(such as an Amazon Simple Storage Service bucket) or AWS public
endpoints, use a public virtual interface.
Alternatively you can use the recently released interface endpoint for S3 (AWS Announcement - Amazon S3 now supports AWS PrivateLink), to get this working you need to configure your application to use this endpoint. However this can maybe against your needs, because you will be charged a small fee for each GB processed by the interface endpoint. Therefore the AWS Direct Connect public VIF should be the right choice for you.
I am using few AWS Lambda functions, which are sitting inside private subnets,
These private subnets have VPC endpoints configured for the services for which the functions need access to,
The current setup does not use a NAT gateway, therefore all the traffic from the functions is going through the VPC endpoints.
I now have a use-case where we need to use a NAT gateway,
But would enabling NAT mean that the Functions would no longer use the VPC endpoints for external service access, and instead use the NAT?
I think this works as follows. For:
Gateway endpoints (S3, DynamoDB)
Routes to them are added automatically to our route tables when you create them. Docs says:
If you have an existing route in your route table for all internet
traffic (0.0.0.0/0) that points to an internet gateway, the endpoint
route takes precedence for all traffic destined for the service,
because the IP address range for the service is more specific than
0.0.0.0/0. All other internet traffic goes to your internet gateway, including traffic that's destined for the service in other Regions.
Interface VPC Endpoints
They work by modifying IP addresses in a DNS of a service. The IP address will be private addresses of the endpoint interfaces. Docs says:
The hosted zone contains a record set for the default DNS name for the
service (for example, ec2.us-east-1.amazonaws.com) that resolves to
the private IP addresses of the endpoint network interfaces in your
VPC. This enables you to make requests to the service using its
default DNS hostname instead of the endpoint-specific DNS hostnames.
To use private DNS, you must set the following VPC attributes to true:
enableDnsHostnames and enableDnsSupport.
Conclusion
So in both cases, priority is given to the interfaces, not the internet. I recommend checking the links provided. They have more info with examples to double check my conclusions.
VPC Endpoints or NAT Gateway?
AWS services like EC2, RDS, Lambda, and ElastiCache come with an Elastic Network Interface (ENI), which enables communication from within your VPCs via Private Endpoints. However, many AWS services provide a REST API, available via the Internet only. A few examples: S3, DynamoDB, CloudWatch, SQS, and Kinesis.
There are three options to make these services accessible from private subnets:
A VPC Endpoint type: Gateway Endpoints is free of charge, but are only available for S3 and DynamoDB.
A VPC Endpoint type: Interface Endpoint costs $7.20 per month and AZ plus $0.01 per GB and is available for most AWS services.
A NAT Gateway can be used to access AWS services or any other services with a public API. Costs are $32.40 per month and AZ plus $0.045 per GB.
Keep the following rules of thumb in mind when designing your network architecture.
Adding Gateway Endpoints for S3 and DynamoDB should be your default option.
Do you need to access non-AWS resources via the Internet, add a NAT Gateway. Do the math if traffic to AWS services justifies additional Interface Endpoints.
Are you only accessing AWS services from the private subnets? No more than four different services? Use Interface Endpoints. Otherwise, do the math to calculate costs for Interface Endpoints and NAT Gateway.
Ref Link: https://cloudonaut.io/advanved-aws-networking-pitfalls-that-you-should-avoid/
This question is inspired by this tweet by someone who accidentally and unexpectedly incurred a large bill due to NAT gateway.
I'm using EC2 to process terabytes of data from an S3 bucket. The bucket and the instance are in the same region.
My goal is to minimize costs. In particular, I want to pay $0 for S3 data transfer costs. According to the S3 pricing page, this should be possible:
Transfers between S3 buckets or from Amazon S3 to any service(s) within the same AWS Region are free.
My instance is in a VPC, has a public IP address, no NAT gateway, no S3 gateway endpoint.
I observe that over months of doing this, I'm not being charged. Whereas traceroute from a server in a different region shows intermediate hops to the S3 host, the route from a server in the same region shows no intermediate hops to the S3 endpoint. Is this always guaranteed? Could Amazon's DNS resolver one day give me an IP address that requires routing over the public Internet, thus incurring thousands of dollars of fees?
This question seems a bit related, but doesn't really address the core question.
The tweet does not appear to accurately reflect the true nature of the charges they incurred.
(I'm not saying they weren't charged, I'm saying that it isn't correct to describe it as if S3 isn't free in this case, even though the tweet implies that this is the case.)
S3 traffic to/from other services within the same region isn't free with a * -- it's just free.
Transfers between S3 buckets or from Amazon S3 to any service(s) within the same AWS Region are free.
https://aws.amazon.com/s3/pricing/
That doesn't say anything about the routing of the traffic, and the routing of the traffic is not important, because -- back to the tweet -- they would not have been billed those usage charges by Amazon S3.
They would have been billed by Amazon VPC for using a NAT Gateway. What you access through a NAT Gateway isn't relevant, because the "data processing" charge always apply to traffic passing through it.
Data processing charges apply for each Gigabyte processed through the NAT gateway regardless of the traffic’s source or destination. (emphasis added)
https://aws.amazon.com/vpc/pricing/
The NAT Gateway pricing page (including old versions like this one) specifically mentions that accessing S3 through a NAT Gateway is subject to all the charges applicable to NAT Gateway.
Accessing S3 within the same region using either an EC2 instance with a public IP address or using an S3 endpoint does not incur any data transfer charges.
When you access S3 within the region, the traffic -- by the relevant definition -- doesn't leave the region, because objects stored in a given region are always located in the region.
Objects stored in a Region never leave the Region unless you explicitly transfer them to another Region.
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
As long as you aren't using a NAT Gateway, or doing something similarly sub-optimal, like accessing S3 by transiting an EC2 NAT Instance or forward proxy (e.g. Squid) in another region (which would result in cross-region traffic charges between your client instance and the NAT Instance or proxy billed by VPC or EC2 -- not S3) then you should not expect to pay for data transfer related to S3 within a region.
If I am downloading an S3 object from an EC2 instance, does this request leave the Amazon network, or is the request made via Internet?
It depends whether the EC2 instance and S3 bucket are in the same region or not.
All communication between regions is across the public Internet.
You can read more about AWS regions and availability zones here. Communication within the same region happens over low-latency private links:
Availability Zones are connected to each other with fast, private
fiber-optic networking.
See AWS Global Infrastructure.
EDIT
Although data transfer happens over private links within the same region, accessing the API endpoints using the SDK or CLI still requires Internet access. See AWS Regions and Endpoints.
If you're concerned about security in Java SDK, the default client configuration is to use HTTPS for all requests for increased security. (Although individual clients can also override this setting by explicitly including the protocol as part of the endpoint URL when calling AmazonWebServiceClient.setEndpoint(String))
If you're concerned about data transfer cost, all inbound traffic from S3 to EC2 is free of charge.