I'm trying to lower egress fees from multiple S3 buckets in one AWS account (several Terabytes per month) to our US datacenter.
I thought of setting up a VPC on our AWS account, and using a Gateway Endpoint for S3 and then having a Direct Connect always active from our datacenter to our AWS VPC (and pay the usual hourly + GB transferred reduced fees).
After reading the documentation it is my understanding that I will not pay any traffic to use the Gateway Endpoint for S3 since traffic never leaves AWS until it reaches our AWS VPC. To transfer it to our datacenter the usual hourly + GB transferred reduced fees for the Direct Connect is then billed.
Is this correct? Will we still be able to initiate a GET from our datacenter applications (through the Direct Connect to this VPC Gateway Endpoint for S3 so that we can pull S3 files from the 3rd party AWS account that endpoint is linked to? (requests would only originate from our datacenter servers to get or sometimes put S3 files).
You will not be able to route to the gateway endpoint via direct connect. However, you can configure a public VIF on your direct connect to route all traffic to the Amazon IP spaces via the direct connect.
From: Which type of virtual interface should I use to connect different resources in AWS?
To connect to AWS resources that are reachable by a public IP address
(such as an Amazon Simple Storage Service bucket) or AWS public
endpoints, use a public virtual interface.
Alternatively you can use the recently released interface endpoint for S3 (AWS Announcement - Amazon S3 now supports AWS PrivateLink), to get this working you need to configure your application to use this endpoint. However this can maybe against your needs, because you will be charged a small fee for each GB processed by the interface endpoint. Therefore the AWS Direct Connect public VIF should be the right choice for you.
Related
I have an Ec2 instance in a public subnet and would be uploading data to an s3 bucket.
I understand that while this traffic traverses the internet gateway, it does not leave the AWS network
Reference: https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-access-aws-services.html
Now I am creating a s3 gateway end point (modify the route table to send traffic thru this).
I also try creating a s3 interface end point.
I measure that the time it takes to upload a 250MB file is the same in both cases (gateway endpoint and interface endpoint)
I am unable to understand two points:
If traffic does NOT leave the AWS network even though Internet gateway is used, in this case, is there any security benefit ?
When using privatelink, I understand that the traffic goes thru Hyperplane, which is why I get increased upload speed.
https://www.youtube.com/watch?v=8gc2DgBqo9U&t=2010s
And AWS is charging for the interface point.
However I don't understand why s3 gateway endpoints are free.
Does it not use hyperplane ?
Is it less performant or resilient ?
You have three options for uploading data from EC2 to S3 by using
Internet gateway: in this case, traffic DOES leave your VPC and goes over AWS network. It's less secure and slower.
Interface endpoint: traffic DOES NOT leave your VPC and goes directly to service. It's secure and fast, but it isn't free.
Gateway endpoint: traffic DOES NOT leave your VPC and goes directly to service. It's secure, fast and free, though you are limited to DynamoDB and S3 services.
Both interface endpoint and gateway endpoint use AWS PrivateLink (Hyperplane) technology.
I think the difference in pricing is related to difference in the nature of services deployment:
Interface endpoint is basically a separate ENI in subnet whereas
Gateway endpoint is an attachment on VPC level which is used for accepting S3/DynamoDB traffic.
Interface endpoints as ENIs got an IP address allocated and they are under your (customer) control. Hence, you are consuming AWS network resources.
At the same time, underlying network resources for Gateway endpoints are not exposed to you (customer), hence, you as a customer, do not reserve any AWS resources and there is nothing to be charged for.
I'm working with AWS and need some support please.
My team provisioned Direct Connect and we can now enjoy private connectivity from our corporate network to VPC on AWS.
Management is asking if it's possible that aws cli commands are executed through Direct Connect and not through the public internet. Indeed, we have a lot of scripts with a lot of commands like aws ec2 describe-instances and so on. I guess these calls the public REST API of EC2 service that AWS exposes.
They're asking if it's possible that these calls do not go through the public internet.
I've seen VPC endpoints? Are they the solution?
See How can I access my Amazon S3 bucket over Direct Connect? for how to do this with S3.
Basically:
After BGP is up and established, the Direct Connect router advertises all global public IP prefixes, including Amazon S3 prefixes. Traffic heading to Amazon S3 is routed through the Direct Connect public virtual interface. The public virtual interface is routed through a private network connection between AWS and your data center or corporate network.
You can extend this to other Amazon services, per the AWS Direct Connect FAQs:
All AWS services, including Amazon Elastic Compute Cloud (EC2), Amazon Virtual Private Cloud (VPC), Amazon Simple Storage Service (S3), and Amazon DynamoDB can be used with Direct Connect.
Refer to #jarmod's answer below for the answer to the question but read on for why I think this sounds like an XY problem.
There is no reason at all why management should be concerned.
Third-party auditors assess the security and compliance of AWS services as part of multiple AWS compliance programs. Using the AWS CLI to access a service does not alter that service's compliance - AWS has compliance programs which pretty much cover every IT compliance framework out there globally.
Compliance aside, the AWS CLI does not store any customer data (there should be no data protection concerns) & transmits data securely (unless you manually override this).
The user guide highlights this:
The AWS CLI does not itself store any customer data other than the credentials it needs to interact with the AWS services on the user's behalf.
By default, all data transmitted from the client computer running the AWS CLI and AWS service endpoints is encrypted by sending everything through a HTTPS/TLS connection.
You don't need to do anything to enable the use of HTTPS/TLS. It is always enabled unless you explicitly disable it for an individual command by using the --no-verify-ssl command line option.
As if that's not enough, you can also add increased security when communicating with AWS services by enforcing a minimum version of TLS 1.2 to be used by the CLI.
There should be targeting of much much bigger attack vectors, like:
The physical accessibility of the device storing the credentials
Permanent access tokens vs. temporary credentials
IAM policies associated with the credentials
The AWS CLI is secure.
This question is inspired by this tweet by someone who accidentally and unexpectedly incurred a large bill due to NAT gateway.
I'm using EC2 to process terabytes of data from an S3 bucket. The bucket and the instance are in the same region.
My goal is to minimize costs. In particular, I want to pay $0 for S3 data transfer costs. According to the S3 pricing page, this should be possible:
Transfers between S3 buckets or from Amazon S3 to any service(s) within the same AWS Region are free.
My instance is in a VPC, has a public IP address, no NAT gateway, no S3 gateway endpoint.
I observe that over months of doing this, I'm not being charged. Whereas traceroute from a server in a different region shows intermediate hops to the S3 host, the route from a server in the same region shows no intermediate hops to the S3 endpoint. Is this always guaranteed? Could Amazon's DNS resolver one day give me an IP address that requires routing over the public Internet, thus incurring thousands of dollars of fees?
This question seems a bit related, but doesn't really address the core question.
The tweet does not appear to accurately reflect the true nature of the charges they incurred.
(I'm not saying they weren't charged, I'm saying that it isn't correct to describe it as if S3 isn't free in this case, even though the tweet implies that this is the case.)
S3 traffic to/from other services within the same region isn't free with a * -- it's just free.
Transfers between S3 buckets or from Amazon S3 to any service(s) within the same AWS Region are free.
https://aws.amazon.com/s3/pricing/
That doesn't say anything about the routing of the traffic, and the routing of the traffic is not important, because -- back to the tweet -- they would not have been billed those usage charges by Amazon S3.
They would have been billed by Amazon VPC for using a NAT Gateway. What you access through a NAT Gateway isn't relevant, because the "data processing" charge always apply to traffic passing through it.
Data processing charges apply for each Gigabyte processed through the NAT gateway regardless of the traffic’s source or destination. (emphasis added)
https://aws.amazon.com/vpc/pricing/
The NAT Gateway pricing page (including old versions like this one) specifically mentions that accessing S3 through a NAT Gateway is subject to all the charges applicable to NAT Gateway.
Accessing S3 within the same region using either an EC2 instance with a public IP address or using an S3 endpoint does not incur any data transfer charges.
When you access S3 within the region, the traffic -- by the relevant definition -- doesn't leave the region, because objects stored in a given region are always located in the region.
Objects stored in a Region never leave the Region unless you explicitly transfer them to another Region.
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
As long as you aren't using a NAT Gateway, or doing something similarly sub-optimal, like accessing S3 by transiting an EC2 NAT Instance or forward proxy (e.g. Squid) in another region (which would result in cross-region traffic charges between your client instance and the NAT Instance or proxy billed by VPC or EC2 -- not S3) then you should not expect to pay for data transfer related to S3 within a region.
When to use aws direct connect and aws storage gateway. My question is these services seems to be similer, so what are use cases to use these two services.
AWS Direct Connect is a network connection between AWS and on on-premises network. The physical connection is an optical fiber link organised through a Telco, while Direct Connect provisions the physical port where the fiber connects in an AWS transit center.
AWS Storage Gateway is a storage service that provisions a virtual tape drive, virtual S3 drive or virtual disk that is stored in AWS. It typically runs across a Direct Connect connection.
AWS direct connect connect the in premisses resources with any services, while AWS storage gateway used to connects to S3 services including AWS S3 Glacier only.
This is one of the difference.
"AWS Direct Connect is a network service that provides an alternative to using the Internet to connect customer's on-premise sites to AWS" (AWS Docs).
"AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage" (AWS Docs).
Direct Connect creates a private network connection btwn AWS and on-prem resources while Storage Gateway enables you to store and retrieve Amazon S3 objects through standard file storage protocol.
Storage Gateway - As the name suggests, this service is used to connect on-premises infra with STORAGE services (specifically S3, FSx, EBS)
Direct Connect - This service is used to connect on-premises infra with any AWS resources (in any region)
If I am downloading an S3 object from an EC2 instance, does this request leave the Amazon network, or is the request made via Internet?
It depends whether the EC2 instance and S3 bucket are in the same region or not.
All communication between regions is across the public Internet.
You can read more about AWS regions and availability zones here. Communication within the same region happens over low-latency private links:
Availability Zones are connected to each other with fast, private
fiber-optic networking.
See AWS Global Infrastructure.
EDIT
Although data transfer happens over private links within the same region, accessing the API endpoints using the SDK or CLI still requires Internet access. See AWS Regions and Endpoints.
If you're concerned about security in Java SDK, the default client configuration is to use HTTPS for all requests for increased security. (Although individual clients can also override this setting by explicitly including the protocol as part of the endpoint URL when calling AmazonWebServiceClient.setEndpoint(String))
If you're concerned about data transfer cost, all inbound traffic from S3 to EC2 is free of charge.