AWS EC2 magically routes to S3 using private IP - amazon-web-services

I've deployed an EKS cluster, each of the EC2 instances has its own public ip and all attached to the same VPC.
The routing table for each of the EC2 instances subnets looks as follows:
Destination | Target
----------------------------------------
192.168.0.0/16 | local
0.0.0.0/0 | igw-06d8c484aaba8d136
So if I understand correctly, every time I read from an S3 bucket (in the same region or not), it should be routed using the internet gateway to the internet, and if that's the case, I should see charges for it.
However, I don't see any charges for it, I tough that maybe AWS network magically taking care of these routings but they actually provide another solution and these rules aren't in any of the routing tables.
Not that I'm against free services, but I prefer to understand this before reading lots of data

Perhaps you are still in the AWS free tier (12 months free) with 20,000 get requests free and 2,000 put requests free source
EDIT:
As you say the free stuff does appear in cost explorer.
However, if EC2 instance and S3 bucket are in the same region, the requests should be free regardless of endpoints, if you have this setup, I think that's the explanation.
This article summarises it quite nicely.
I would recommend setting up an s3 endpoint if you wish to keep internal transfer costs down after your free tier expires though, it's also great for performance.

Related

Performance test in AWS : How to guarantee Bandwidth

I need to run a performance test against an application based on Elastic Beanstalk located in AWS fronted by and ELB.
I expect traffic to be around 25 Gbit/s
As per AWS requirements, I am using another account (dedicated to tests) from my AWS organisation.
The application is a production application in another account of my AWS organisation.
My performance test will use the DNS entry of the production website, it will be executed by EC2 instances in subnet of a VPC that has an internet gateway.
I have a doubt regarding the bandwidth, I don't understand from AWS documentations I read if there will be a limitation of bandwidth or not ?
From this answer it seems I may face such issues:
https://stackoverflow.com/a/62344703/9565222
In this case, how can I run a performance test that reflects what happens in production, ie pass through DNS entry pointing to the ELB.
Let's say I create a Peering connection between the Test account VPC and production VPC, what is the max bandwidth ?
My test shows that with 3 c5d.9xlarge using a VPC Peering connection , I only get around 10 Gbits/s, so it would be the max whatever the number of instances.
Another test shows that with 3 c5d.9xlarge using a Internet Gateway, I get varying bandwidth capped around 12 Gbits/s, but I cannot tell what's the real limit.
So what are my option ?
- VPC Peering is not
- Internet Gateway from multiple machines may be but I would like a kind of guarantee
- Are there better options (Transit Gateway ?) ?
I need to run a performance test against an application based on Elastic Beanstalk located in AWS fronted by and ELB. I expect traffic to be around 25 Gbit/s
That sounds totally fine, ELB can easily handle 25 Gbps.
Make sure that your test reflects what your production load is going to be like. If your production load is all coming from a very small number of sources, replicate that. If it's coming from a very large number of sources (e.g., lots of users of a client app, each generating a bit of traffic, resulting in a ton of total aggregated traffic), make sure you replicate that. There are differences that may seem nuanced if you're not experienced in this kind of testing, and reproducing the real environment as closely as possible is the easiest way to avoid any of those issues.
For testing with a very large number of relatively low-bandwidth sources, take a look at projects like these:
Bees with Machine Guns
Tsung
I have a doubt regarding the bandwidth, I don't understand from AWS documentations I read if there will be a limitation of bandwidth or not ?
Some components in AWS have bandwidth limitations, some don't.
Specifically, EC2 instances each have a maximum bandwidth they support depending on the instance type. Also, you should know that even if a given EC2 Instance Type supports a certain bandwidth, you need to be sure that the OS running on that instance supports that bandwidth. This usually means that you need to ensure that the correct drivers are being used. In my experience, as long as you use the most recent version of Amazon Linux avaialable, everything should "just work".
Also, as I mention in more details later, VPC Peering Connections and Internet Gateway are do not limit bandwidth.
Let's say I create a Peering connection between the Test account VPC and production VPC, what is the max bandwidth ?
VPC Peering Connections are not a bandwidth bottleneck. That is, they don't limit the amount of bandwidth you have across the peering connection.
From the Amazon VPC FAQ:
Q. Are there any bandwidth limitations for peering connections?
Bandwidth between instances in peered VPCs is no different than bandwidth between instances in the same VPC.
[nb: there's a note about placement groups in the FAQs, but you don't mentioned that so I removed it; if you are using the feature, please clarify, as it's something that you most likely shouldn't be using anyway based on what you described originally in the question]
My test shows that with 3 c5d.9xlarge using a VPC Peering connection , I only get around 10 Gbits/s
The c5d.9xlarge instance type is limited to 10 Gbps. So if you use that for your test, you won't ever see one instance with more than 10 Gbps.
More info here: Amazon EC2 C5 Instances.
Also, make sure you check the EC2 C6g instances. I haven't personally used them, but they are supposed to be incredibly faster and lower cost: they were released just 2 days ago.
Another test shows that with 3 c5d.9xlarge using a Internet Gateway, I get varying bandwidth capped around 12 Gbits/s [...]
The Internet Gateway isn't a bandwidth bottleneck. In other words, there's no bandwidth limit imposed by the Internet Gateway.
In fact, there's no "single device" that is an Internet Gateway. Think of it more as a "flag" that tells the VPC networking system that your VPC has a path to and from the Internet.
From the Amazon VPC FAQ:
Q. Are there any bandwidth limitations for Internet gateways? Do I need to be concerned about its availability? Can it be a single point of failure?
No. An Internet gateway is horizontally-scaled, redundant, and highly available. It imposes no bandwidth constraints.
So what are my option ? - VPC Peering is not - Internet Gateway from multiple machines may be but I would like a kind of guarantee - Are there better options (Transit Gateway ?) ?
VPC Peering is probably the best choice here. As I mentioned, it is not limiting your bandwidth. Check other things like I mentioned before: the instance type, the OS, the drivers, etc.
Using an Internet Gateway for this implies that, from a routing perspective, your traffic is "leaving AWS" and going "out to the Internet" (even though, physically, it probably won't ever truly leave AWS's physical devices). This means that, from a billing perspective, you'll be charged "Data Transfer Out to the Internet" rates. They are significantly higher than what you'd pay for VPC Peering.
I see no need for a Transit Gateway here, as the scenario you describe is really simple and can be solved with a VPC Peering Connection.

Is traffic from a VPC EC2 instance with a public IP address to an S3 bucket in the same region guaranteed to stay within Amazon's network?

This question is inspired by this tweet by someone who accidentally and unexpectedly incurred a large bill due to NAT gateway.
I'm using EC2 to process terabytes of data from an S3 bucket. The bucket and the instance are in the same region.
My goal is to minimize costs. In particular, I want to pay $0 for S3 data transfer costs. According to the S3 pricing page, this should be possible:
Transfers between S3 buckets or from Amazon S3 to any service(s) within the same AWS Region are free.
My instance is in a VPC, has a public IP address, no NAT gateway, no S3 gateway endpoint.
I observe that over months of doing this, I'm not being charged. Whereas traceroute from a server in a different region shows intermediate hops to the S3 host, the route from a server in the same region shows no intermediate hops to the S3 endpoint. Is this always guaranteed? Could Amazon's DNS resolver one day give me an IP address that requires routing over the public Internet, thus incurring thousands of dollars of fees?
This question seems a bit related, but doesn't really address the core question.
The tweet does not appear to accurately reflect the true nature of the charges they incurred.
(I'm not saying they weren't charged, I'm saying that it isn't correct to describe it as if S3 isn't free in this case, even though the tweet implies that this is the case.)
S3 traffic to/from other services within the same region isn't free with a * -- it's just free.
Transfers between S3 buckets or from Amazon S3 to any service(s) within the same AWS Region are free.
https://aws.amazon.com/s3/pricing/
That doesn't say anything about the routing of the traffic, and the routing of the traffic is not important, because -- back to the tweet -- they would not have been billed those usage charges by Amazon S3.
They would have been billed by Amazon VPC for using a NAT Gateway. What you access through a NAT Gateway isn't relevant, because the "data processing" charge always apply to traffic passing through it.
Data processing charges apply for each Gigabyte processed through the NAT gateway regardless of the traffic’s source or destination. (emphasis added)
https://aws.amazon.com/vpc/pricing/
The NAT Gateway pricing page (including old versions like this one) specifically mentions that accessing S3 through a NAT Gateway is subject to all the charges applicable to NAT Gateway.
Accessing S3 within the same region using either an EC2 instance with a public IP address or using an S3 endpoint does not incur any data transfer charges.
When you access S3 within the region, the traffic -- by the relevant definition -- doesn't leave the region, because objects stored in a given region are always located in the region.
Objects stored in a Region never leave the Region unless you explicitly transfer them to another Region.
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html
As long as you aren't using a NAT Gateway, or doing something similarly sub-optimal, like accessing S3 by transiting an EC2 NAT Instance or forward proxy (e.g. Squid) in another region (which would result in cross-region traffic charges between your client instance and the NAT Instance or proxy billed by VPC or EC2 -- not S3) then you should not expect to pay for data transfer related to S3 within a region.

Static IP for outbound API calls

A new api service we use requires that we give them a list of all the IP addresses our calls will be coming from; if we make an api call from any other IP address, the call will fail.
This question has been asked before here, but I'm wondering if in 2019 there is any simpler/easier/lower cost solution.
Our Setup
Elastic Beanstalk, which currently scales to anywhere from 5 - 50 ec2 instances for our web application based on traffic
An Application Load Balancer
Also have a worker tier, which would be available for use if that might be helpful
Typically these api calls would be coming from any of our web tier ec2 instances, as the calls will be based on a user interaction. We can of course set up something different, e.g. have the worker tier make the calls
Solutions I've Found
Give each ec2 instance an elastic (static) ip address. This is not a great solution for us, because as we hopefully continue to scale the number of ip addresses needed will continue to grow {ref}
Set up two NAT instances (one not being sufficient as it would be a single point of failure). I'm hoping there is something simpler and lower cost than this option. {ref} {ref}
Create new ec2 instances and put them behind a Network Load Balancer. Again, complex and costly. {ref}
Are there any new, easier, less costly solutions? I have never used AWS Lambda before; maybe it is be possible to run Lambda functions all from one IP address? I don't have many ideas beyond that at this point. Thanks for your time.
A NAT is the best solution, and shouldn't cost you much more than a web-server.
The simplest way to use a NAT is the NAT Gateway. Pricing depends on region, but it's around $0.05/hour, which is a little more than the price of a t3.medium EC2 instance. You're also charged a per-GB rate for data, which can add up quickly. On the positive side, Amazon manages the infrastructure for you, including patches and high-availability.
A NAT Instance is an EC2 instance running a specially-configured AMI. You could probably get away with running this on a t3.micro instance, at $0.01 per hour, which is probably much less than any of your webservers. You will be responsible for applying patches and waking up in the middle of the night if anything goes wrong.
You can probably get away with a single NAT, of either type. You will pay for cross-AZ traffic by doing this ($0.01/GB), so it will be false economy if you move a lot of data across the NAT. It's a tossup on whether you'll get higher availability from two NATs, because you can only reference one at a time in your routing tables. So if one goes down you'll have to update the routing tables to point at the other, which will probably take as much time as bringing up a new instance.
You can't use a Lambda, because it needs to have a permanent IP address assignment and you can't control that with Lambda. You could write your own proxy server, running on EC2, but the costs for that are the same as a NAT Instance.
Here is prescriptive guidance from AWS: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/generate-a-static-outbound-ip-address-using-a-lambda-function-amazon-vpc-and-a-serverless-architecture.html
"This pattern describes how to generate a static outbound IP address in the Amazon Web Services (AWS) Cloud by using a serverless architecture..."
Essentially, you have an AWS Lambda function that uses an Elastic IP address as the outbound IP address. In the guidance, you will create "a Lambda function and a virtual private cloud (VPC) that routes outbound traffic through an internet gateway with a static IP address. To use the static IP address, you attach the Lambda function to the VPC and its subnets. "

Getting AWS Data Transfer charges to regions outside my servers region

My ec2 instance is getting charged for data transfer from almost every available AWS region (Tokyo, Seoul, Singapore, Paris, London, Germany, Ireland, Ohio, Oregon, Sydney, Canada Central, Sao Paulo, Cloud Front, INCLUDING AWS GovCloud (US)). our 99.99% users are from India. As per recommendations of AWS representative have checked no other script are running on our instance and have changed rules for security group inbound rule having only SSH connection on port 22 to static IP. But still, there is data transfer of almost 600GB+. And the documentation for security group doesn't help much is there any other way to stop this data transfer?
Please note that EC2 instance runs the php code and java api tomcat7 service & RDS is on other instance.
First, your question looks like you mention data transfer to other EC2 instances in another region. Perhaps you meant traffic to internet users in other regions?
Second, according to pricing you'll be billed for Data Transfer OUT From Amazon EC2 To Internet whichever region you are and regardless whether the endpoint is in internet or AWS region. So, even if you have users in Mumbai you'll be billed for outbound traffic anyway.
Third, if you want to block transfer on country basis use CDN with proper capability, e.g. CloudFlare.
Please elaborate your question if you meant something else.

If I am downloading an S3 object from an EC2 instance, does this request leave the Amazon network?

If I am downloading an S3 object from an EC2 instance, does this request leave the Amazon network, or is the request made via Internet?
It depends whether the EC2 instance and S3 bucket are in the same region or not.
All communication between regions is across the public Internet.
You can read more about AWS regions and availability zones here. Communication within the same region happens over low-latency private links:
Availability Zones are connected to each other with fast, private
fiber-optic networking.
See AWS Global Infrastructure.
EDIT
Although data transfer happens over private links within the same region, accessing the API endpoints using the SDK or CLI still requires Internet access. See AWS Regions and Endpoints.
If you're concerned about security in Java SDK, the default client configuration is to use HTTPS for all requests for increased security. (Although individual clients can also override this setting by explicitly including the protocol as part of the endpoint URL when calling AmazonWebServiceClient.setEndpoint(String))
If you're concerned about data transfer cost, all inbound traffic from S3 to EC2 is free of charge.