When you navigate to a file uploaded on S3, you'll see its URL in a format such as this (e.g. in this example the bucket name is example and the file is hello.txt):
https://example.s3.us-west-2.amazonaws.com/hello.txt
Notice that the region, us-west-2, is embedded in the domain.
I accidentally tried accessing the same url without the region, and noticed that it worked too:
https://example.s3.amazonaws.com/hello.txt
It seems much simpler to use these shorter URLs rather than the longer ones as I don't need to pass around the region.
Are there any advantages/disadvantages of excluding the region from the domain? Or are the two domains the same?
This is a deprecated feature of Amazon S3 known as Global Endpoints. Some regions support the global endpoint for backward compatibility purposes. AWS recommends that you use the standard endpoint syntax in the future.
For regions that support the global endpoint, your request is redirected to the standard endpoint. By default, Amazon routes global endpoint requests to the us-east-1 region. For buckets that are in supported regions other than us-east-1, Amazon S3 updates the DNS record for future requests (note that DNS updates require 24-48 hours to propagate). Amazon then redirects the request to the correct region using the HTTP 307 Temporary Redirect.
Are there any advantages/disadvantages of excluding the region from the domain? Or are the two domains the same?
The domains are not the same.
Advantages to using the legacy global endpoint: the URL is shorter.
Disadvantages: the request must be redirected and is, therefore, less efficient. Further, if you create a bucket in a region that does not support global endpoints, AWS will return an HTTP 400 Bad Request error response.
TLDR: It is a best practice to use the standard (regional) S3 endpoint syntax.
Related
I was wondering how getbucketlocation work. Is there a centralized store to save all the bucket-location mappings? Buckets created in Regions launched before March 20, 2019 are reachable via the https://bucket.s3.amazonaws.com So If I have a bucket, then I use https://bucket.s3.amazonaws.com/xxxxx to access the bucket then it will query the centralized mapping store for the region then route my request to correct region?
There's a centralized database in us-east-1 and all the other regions have replicas of it. This is used for the GET bucket location API call as well as List Buckets.
But this isn't used for request routing.
Request routing is a simple system -- the database is DNS. There's a DNS record automatically created for every single bucket -- a CNAME to an S3 endpoint in the bucket's region.
There's also a *.s3.amazonaws.com DNS wildcard that points to us-east-1... so these hostnames work immediately when the new bucket is in us-east-1. Otherwise there's a delay until the specific bucket record is created, overriding the wildcard, and requests send to that endpoint will arrive at us-east-1, which will respond with an HTTP redirect to an appropriate regional endpoint for the bucket.
Why they might have stopped doing this for new regions is presumably related to scaling considerations, and the fact that it's no longer as useful as it once was. The ${bucket}.s3.amazonaws.com URL style became largely irrelevant when mandatory Signature Version 4 authentication became the rule for regions launched in 2014 and later, because you can't generate a valid Sig V4 URL without knowing the target region of the request. Signature V2 signing didn't require the region to be known to the code generating a signature.
S3 also didn't historically have consistent hostnames for regional endpoints. For example, in us-west-2, the regional endpoints used to be ${bucket}.s3-us-west-2.amazonaws.com but in us-east-2, the regional endpoints have always been ${bucket}.s3.us-east-2.amazonaws.com... did you spot the difference? After s3 there was a - rather than a . so constructing a regional URL also required knowledge of the random rules for different regions. Even more random was that region-specific endpoints for us-east-1 were actually ${bucket}.s3-external-1.amazonaws.com unless, of course, you had a reason to use ${bucket}.s3-external-2.amazonaws.com (There was a legacy reason for this -- it made sense at the time, but it was a long time ago.)
To their credit, they fixed this so that all regions now support ${bucket}.s3.${region}.amazonaws.com and yet (also to their credit) the old URLs also still work in older regions, even though standardization is now in place.
I have signed up to Amazon Web Services and created a static website via Amazon S3 service ( created a Bucket and mapped a domain to that Bucket).
This service looks great but I have one problem - I don't know how to block Bad Bots and to prevent them to waste my bandwidth ( you all know that Amazon charge for bandwidth)
Amazon Web Services doesn't support .htaccess and I have no idea how to block them.
What I need is to block the bad bots via 2 ways:
Via Bot Name, e.g.: BadBot1
Via Bot IP, e.g.: 185.11.240.175
Can you please help me to do it?
Your S3 bucket policy will definitely allow you to block specified IP addresses, but there is a size limitation (~20 kb) on bucket policy sizes, which would probably make trying to maintain a policy restricting disreputable IP addresses unfeasible.
AWS's WAF & Shield service, fronted by Cloudfront, is the most powerful way AWS provides to block IPs, and you could easily integrate this with an S3 origin. Cloudfront allows you to plug in a Waf & Shield ACL, which is comprised of rules that allow or disallow sets of IPs that you define.
AWS has some sample Lambda functions here that you can use as a starting point. You would probably want a Lambda function to run on a schedule, obtain the list of IPs that you want to block, parse that list, and add new IPs found to your WAF's IP sets (or remove ones no longer on the list). The waf-tor-blocking and waf-reputation-lists functions in the above link provide good examples for how to do this.
I'm not sure exactly what you mean by detecting Bot Name, but the standard Waf & Shield approach is currently to parse Cloudfront logs sent to an s3 bucket. Your s3 bucket would trigger SNS or a Lambda function directly whenever it receives a new gzipped log file. The Lambda function will then download that file, parse it for malicious requests, and block the associated IP addresses. The waf-block-bad-behaving and waf-reactive-blacklist functions in the repo I linked to provide examples for how you would approach this. Occasionally you will see signatures for bad bots in the user-agent string of the request. The Cloudfront logs will show the user-agent string, so you could potentially parse that and block associated IPs accordingly.
On Amazon S3, you can restrict access to buckets by domain.
But as far as I understand from a helpful StackOverflow user, you cannot do this on CloudFront. But why? If I am correct, CloudFront only allows time-based restrictions or IP restrictions (--> so I need to know the IP's of random visitors..?) Or am I missing something?
Here is a quote from S3 documentation that suggests that per-domain restriction is possible:
---> " To allow read access to these objects from your website, you can add a bucket policy that allows s3:GetObject permission with a condition, using the aws:referer key, that the get request must originate from specific webpages. "
--> Is there a way to make this method work on CloudFront as well? Or why something like this is not available on CloudFront?
--> Is there a similar service where this is possible, easier to setup?
Using CloudFront along with WAF (Web Application Firewall), you can restrict requests based on IP address, referrers, or domains.
Here is a AWS blog tutorial on restricting "hotlinking".
https://blogs.aws.amazon.com/security/post/Tx2CSKIBS7EP1I5/How-to-Prevent-Hotlinking-by-Using-AWS-WAF-Amazon-CloudFront-and-Referer-Checkin
In this example, it prohibits requests where the Referrer: header does not match a specific domain.
I have an application which is a static website builder.Users can create their websites and publish them to their custom domains.I am using Amazon S3 to host these sites and a proxy server nginx to route the requests to the S3 bucket hosting sites.
I am facing a load time issue.As S3 specifically is not associated with any region and the content being entirely HTML there shouldn't ideally be any delay.I have a few css and js files which are not too heavy.
What can be the optimization techniques for better performance? eg: Will setting headers ? or Leverage caching help? I have added an image of pingdom analysis for reference.
Also i cannot use cloudfront as when the user updates an image the edge locations have a delay of few minutes before the new image is reflected.It is not instant update,hence restricting the use for me. Any suggestions on improving it?
S3 HTTPS access from a different region is extremely slow especially TLS handshake. To solve the problem we invented Nginx S3 proxy which can be find over the web. S3 is the best as origin source but not as a transport endpoint.
By the way try to avoid your "folder" as a subdomain but specify only S3 regional(!) endpoint URL instead with the long version of endpoint URL, never use https://s3.amazonaws.com
One the good example that reduces number of DNS calls is the following below:
https://s3-eu-west-1.amazonaws.com/folder/file.jpg
Your S3 buckets are associated with a specific region that you can choose when you create them. They are not geographically distributed. Please see AWS doc about S3 regions: https://aws.amazon.com/s3/faqs/
As we can see in your screenshot, it looks like your bucket is located in Singapore (ap-southeast-1).
Are your clients located in Asia? If they are not, you should try to create buckets nearer, in order to reduce data access latency.
About cloudfront, it should be possible to use it if you invalide your objects, or just use new filenames for each modification, as tedder42 suggested.
I'm using Amazon's simple storage service (S3). I noticed that others like Trello were able to configure sub-domain for their S3 links. In the following link they have trello-attachments as sub-domain.
https://trello-attachments.s3.amazonaws.com/.../.../..../file.png
Where can I configure this?
You don't have to configure it.
All buckets work that way if there are no dots in the bucket name and it's otherwise a hostname made up of valid characters. If your bucket isn't in the "US-Standard" region, you may have to use the correct endpoint instead of ".s3.amazonaws.com" to avoid a redirect (or to make it work at all).
An ordinary Amazon S3 REST request specifies a bucket by using the first slash-delimited component of the Request-URI path. Alternatively, you can use Amazon S3 virtual hosting to address a bucket in a REST API call by using the HTTP Host header. In practice, Amazon S3 interprets Host as meaning that most buckets are automatically accessible (for limited types of requests) at http://bucketname.s3.amazonaws.com.
— http://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html