I was wondering how getbucketlocation work. Is there a centralized store to save all the bucket-location mappings? Buckets created in Regions launched before March 20, 2019 are reachable via the https://bucket.s3.amazonaws.com So If I have a bucket, then I use https://bucket.s3.amazonaws.com/xxxxx to access the bucket then it will query the centralized mapping store for the region then route my request to correct region?
There's a centralized database in us-east-1 and all the other regions have replicas of it. This is used for the GET bucket location API call as well as List Buckets.
But this isn't used for request routing.
Request routing is a simple system -- the database is DNS. There's a DNS record automatically created for every single bucket -- a CNAME to an S3 endpoint in the bucket's region.
There's also a *.s3.amazonaws.com DNS wildcard that points to us-east-1... so these hostnames work immediately when the new bucket is in us-east-1. Otherwise there's a delay until the specific bucket record is created, overriding the wildcard, and requests send to that endpoint will arrive at us-east-1, which will respond with an HTTP redirect to an appropriate regional endpoint for the bucket.
Why they might have stopped doing this for new regions is presumably related to scaling considerations, and the fact that it's no longer as useful as it once was. The ${bucket}.s3.amazonaws.com URL style became largely irrelevant when mandatory Signature Version 4 authentication became the rule for regions launched in 2014 and later, because you can't generate a valid Sig V4 URL without knowing the target region of the request. Signature V2 signing didn't require the region to be known to the code generating a signature.
S3 also didn't historically have consistent hostnames for regional endpoints. For example, in us-west-2, the regional endpoints used to be ${bucket}.s3-us-west-2.amazonaws.com but in us-east-2, the regional endpoints have always been ${bucket}.s3.us-east-2.amazonaws.com... did you spot the difference? After s3 there was a - rather than a . so constructing a regional URL also required knowledge of the random rules for different regions. Even more random was that region-specific endpoints for us-east-1 were actually ${bucket}.s3-external-1.amazonaws.com unless, of course, you had a reason to use ${bucket}.s3-external-2.amazonaws.com (There was a legacy reason for this -- it made sense at the time, but it was a long time ago.)
To their credit, they fixed this so that all regions now support ${bucket}.s3.${region}.amazonaws.com and yet (also to their credit) the old URLs also still work in older regions, even though standardization is now in place.
Related
When you navigate to a file uploaded on S3, you'll see its URL in a format such as this (e.g. in this example the bucket name is example and the file is hello.txt):
https://example.s3.us-west-2.amazonaws.com/hello.txt
Notice that the region, us-west-2, is embedded in the domain.
I accidentally tried accessing the same url without the region, and noticed that it worked too:
https://example.s3.amazonaws.com/hello.txt
It seems much simpler to use these shorter URLs rather than the longer ones as I don't need to pass around the region.
Are there any advantages/disadvantages of excluding the region from the domain? Or are the two domains the same?
This is a deprecated feature of Amazon S3 known as Global Endpoints. Some regions support the global endpoint for backward compatibility purposes. AWS recommends that you use the standard endpoint syntax in the future.
For regions that support the global endpoint, your request is redirected to the standard endpoint. By default, Amazon routes global endpoint requests to the us-east-1 region. For buckets that are in supported regions other than us-east-1, Amazon S3 updates the DNS record for future requests (note that DNS updates require 24-48 hours to propagate). Amazon then redirects the request to the correct region using the HTTP 307 Temporary Redirect.
Are there any advantages/disadvantages of excluding the region from the domain? Or are the two domains the same?
The domains are not the same.
Advantages to using the legacy global endpoint: the URL is shorter.
Disadvantages: the request must be redirected and is, therefore, less efficient. Further, if you create a bucket in a region that does not support global endpoints, AWS will return an HTTP 400 Bad Request error response.
TLDR: It is a best practice to use the standard (regional) S3 endpoint syntax.
In Api Gateway I've created one custom domain, foo.example.com, which creates a Cloud Front distribution with that CNAME.
I also want to create a wildcard domain, *.example.com, but when attempting to create it, CloudFront throws an error:
CNAMEAlreadyExistsException: One or more of the CNAMEs you provided
are already associated with a different resource
AWS in its docs states that:
However, you can add a wildcard alternate domain name, such as
*.example.com, that includes (that overlaps with) a non-wildcard alternate domain name, such as www.example.com. Overlapping domain
names can be in the same distribution or in separate distributions as
long as both distributions were created by using the same AWS account.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/CNAMEs.html#alternate-domain-names-wildcard
So I might have misunderstood this, is it possible to accomplish what I've described?
This is very likely to be a side-effect of your API Gateway endpoint being configured as Edge Optimized instead of Regional, because with an edge-optimized API, there is a hidden CloudFront distribution provisioned automatically... however, the CloudFront distribution associated with your API is not owned by your account, but rather by an account associated with API Gateway.
Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution that is created and managed by API Gateway.
— Amazon API Gateway Supports Regional API Endpoints
This creates a conflict that prevents the wildcard distribution from being created.
Subdomains that mask a wildcard are not allowed to cross AWS account boundaries, because this would potentially allow traffic for a wildcard distribution's matching domains to be hijacked by creating a more specific alternate domain name -- but, as you noted from the documentation, you can do within your own account.
Redeploying your API as Regional instead of Edge Optimized is the likely solution. If you still want the edge optimization behavior, you can create another CloudFront distribution with that specific subdomain for use with the API. This would be allowed, because you would own the distribution. Regional APIs are still globally accessible.
Yes it is. But keep in mind that CNAMEs set for CloudFront distributions are validated to be globally unique, including API Gateway distributions. So this means you (or any other account) have that CNAME already set up. Currently there is no way to lookup where the conflict is, you may need to raise a ticket with AWS support if you can't find that yourself.
I have two buckets a and b with static websites enabled that redirect to original buckets A and B. I created two route53 record sets(A records) slave-1 and slave-2 pointing to each bucket a and b. I then created a Master record set(A record) with failover, slave-1 as primary and slave-2 as secondary. When I try to access the S3 contents using the Master, I get a 404 'No Such Bucket.' Is there a way that I can get this set up to work? If are there are any workarounds for configurations like this?
S3 only supports directly accessing a bucket using either one of the endpoint hostnames directly (such as example-bucket.s3.amazonaws.com) or via a DNS record pointing to the bucket endpoint when the name of the bucket matches the entire hostname presented in the Host: header (the hostname my-bucket.example.com works with a bucket named exactly "my-bucket.example.com").
If your tool will be signing requests for the bucket, there is no simple and practical workaround, since the signatures will not match on the request. (This technically could be done with a proxy that has knowledge of the keys and secrets, validates the original signature, strips it, then re-signs the request, but this is a complex solution.)
If you simply need to fetch content from the buckets, then use CloudFront. When CloudFront is configured in front of a bucket, you can point a domain name to CloudFront, and specify one or more buckets to handle the requests, based on pattern matching in the request paths. In this configuration, the bucket names and regions are unimportant and independent of the hostname associated with the CloudFront distribution.
I have signed up to Amazon Web Services and created a static website via Amazon S3 service ( created a Bucket and mapped a domain to that Bucket).
This service looks great but I have one problem - I don't know how to block Bad Bots and to prevent them to waste my bandwidth ( you all know that Amazon charge for bandwidth)
Amazon Web Services doesn't support .htaccess and I have no idea how to block them.
What I need is to block the bad bots via 2 ways:
Via Bot Name, e.g.: BadBot1
Via Bot IP, e.g.: 185.11.240.175
Can you please help me to do it?
Your S3 bucket policy will definitely allow you to block specified IP addresses, but there is a size limitation (~20 kb) on bucket policy sizes, which would probably make trying to maintain a policy restricting disreputable IP addresses unfeasible.
AWS's WAF & Shield service, fronted by Cloudfront, is the most powerful way AWS provides to block IPs, and you could easily integrate this with an S3 origin. Cloudfront allows you to plug in a Waf & Shield ACL, which is comprised of rules that allow or disallow sets of IPs that you define.
AWS has some sample Lambda functions here that you can use as a starting point. You would probably want a Lambda function to run on a schedule, obtain the list of IPs that you want to block, parse that list, and add new IPs found to your WAF's IP sets (or remove ones no longer on the list). The waf-tor-blocking and waf-reputation-lists functions in the above link provide good examples for how to do this.
I'm not sure exactly what you mean by detecting Bot Name, but the standard Waf & Shield approach is currently to parse Cloudfront logs sent to an s3 bucket. Your s3 bucket would trigger SNS or a Lambda function directly whenever it receives a new gzipped log file. The Lambda function will then download that file, parse it for malicious requests, and block the associated IP addresses. The waf-block-bad-behaving and waf-reactive-blacklist functions in the repo I linked to provide examples for how you would approach this. Occasionally you will see signatures for bad bots in the user-agent string of the request. The Cloudfront logs will show the user-agent string, so you could potentially parse that and block associated IPs accordingly.
Can we set weighted policy on s3, if yes. What is the step by step process.
I tried that and have a problem that traffic is routed to one endpoint only.
I done research on that and found might it is a problem with CNAME mentioned in cloudfront.
Please suggest correct values also for that.
S3 objects are only stored in a single region, meaning that in order to access that particular object, you must go through that regions API Endpoint.
For example, if you had "image.jpg" stored in a bucket "s3-images", that was created in the eu-west-1 region - in order to download that file you must go through the appropiate S3 Endpoint for the eu-west-1 Region:
s3-eu-west-1.amazonaws.com
If you tried to use another Endpoint, you will get an error, pointing out that you are using the wrong endpoint
If your question is relating to using CloudFront in front of S3, you need to set your DNS CNAME to resolve to your CloudFront Distributions CNAME in order for your users to be routed through CloudFront, rather than hitting S3 directly:
[cdn.example.com] -CNAME-> [d12345.cloudfront.net] -> s3://some-bucket