withRegion(Regions) of AmazonS3ClientBuilder takes what parameter? From AWS documentation says "It sets the region to be used by the client."
Is it the region where our application is running? So that there would be minimum latency as it will read from the same region of S3 bucket where the calling client is deployed?
Or it is the Region where S3 bucket is present?
Sample code of line:
AmazonS3 amazonS3 = AmazonS3ClientBuilder.standard()
.withRegion(Regions.US_EAST_1).build();
Please don't do any guess work.. An URL(pref doc.aws.amazon.com) to support your explanation will be highly appreciated..
https://docs.aws.amazon.com/general/latest/gr/rande.html
Some services, such as IAM, do not support regions; therefore, their endpoints do not include a region. Some services, such as Amazon EC2, let you specify an endpoint that does not include a specific region, for example, https://ec2.amazonaws.com. In that case, AWS routes the endpoint to us-east-1.
If a service supports regions, the resources in each region are independent. For example, if you create an Amazon EC2 instance or an Amazon SQS queue in one region, the instance or queue is independent from instances or queues in another region.
In this case, S3 buckets can be created in specific regions and there are multiple REST endpoints you can access. In the case of S3, you must connect to the same region as the bucket (except for calls such as ListAllMyBuckets that are region agnostic). For other services you do not.
https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region
As you point out, the Javadoc for AmazonS3ClientBuilder is incredibly vague, because it inherits the withBuilder documentation from AwsClientBuilder, which is inherited by services that support regions and those that do not.
To further add to the confusion, particularly when reading older advice scattered over the internet, it was possible in the past to access any bucket from the same region with the S3 Java API (these calls may be slower). It is possible to revert to this behaviour with withForceGlobalBucketAccessEnabled:
https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Builder.html#withForceGlobalBucketAccessEnabled-java.lang.Boolean-
Configure whether global bucket access is enabled for clients generated by this builder.
When global bucket access is enabled, the region to which a request is routed may differ from the region that is configured in AwsClientBuilder.setRegion(String) in order to make the request succeed.
The following behavior is currently used when this mode is enabled:
All requests that do not act on an existing bucket (for example, AmazonS3Client.createBucket(String)) will be routed to the region configured by AwsClientBuilder.setRegion(String), unless the region is manually overridden with CreateBucketRequest.setRegion(String), in which case the request will be routed to the region configured in the request.
The first time a request is made that references an existing bucket (for example, AmazonS3Client.putObject(PutObjectRequest)) a request will be made to the region configured by AwsClientBuilder.setRegion(String) to determine the region in which the bucket was created. This location may be cached in the client for subsequent requests acting on that same bucket.
Enabling this mode has several drawbacks, because it has the potential to increase latency in the event that the location of the bucket is physically far from the location from which the request was invoked. For this reason, it is strongly advised when possible to know the location of your buckets and create a region-specific client to access that bucket.
Related
I have been trying to reason why an S3 bucket name has to be globally unique. I came across the stackoverflow answer as well that says in order to resolve host header, bucket name got to be unique. However, my point is can't AWS direct the s3-region.amazonaws.com to region specific web server that can serve the bucket object from that region? That way the name could be globally unique only for a region. Meaning, the same bucket could be created in a different region. Please let me know if my understanding is completely wrong on how name resolution works or otherwise?
There is not, strictly speaking, a technical reason why the bucket namespace absolutely had to be global. In fact, it technically isn't quite as global as most people might assume, because S3 has three distinct partitions that are completely isolated from each other and do not share the same global bucket namespace across partition boundaries -- the partitions are aws (the global collection of regions most people know as "AWS"), aws-us-gov (US GovCloud), and aws-cn (the Beijing and Ningxia isolated regions).
So things could have been designed differently, with each region independent, but that is irrelevant now, because the global namespace is entrenched.
But why?
The specific reasons for the global namespace aren't publicly stated, but almost certainly have to do with the evolution of the service, backwards compatibility, and ease of adoption of new regions.
S3 is one of the oldest of the AWS services, older than even EC2. They almost certainly did not foresee how large it would become.
Originally, the namespace was global of necessity because there weren't multiple regions. S3 had a single logical region (called "US Standard" for a long time) that was in fact comprised of at least two physical regions, in or near us-east-1 and us-west-2. You didn't know or care which physical region each upload went to, because they replicated back and forth, transparently, and latency-based DNS resolution automatically gave you the endpoint with the lowest latency. Many users never knew this detail.
You could even explicitly override the automatic geo-routing of DNS amd upload to the east using the s3-external-1.amazonaws.com endpoint or to the west using the s3-external-2.amazonaws.com endpoint, but your object would shortly be accessible from either endpoint.
Up until this point, S3 did not offer immediate read-after-write consistency on new objects since that would be impractical in the primary/primary, circular replication environment that existed in earlier days.
Eventually, S3 launched in other AWS regions as they came online, but they designed it so that a bucket in any region could be accessed as ${bucket}.s3.amazonaws.com.
This used DNS to route the request to the correct region, based on the bucket name in the hostname, and S3 maintained the DNS mappings. *.s3.amazonaws.com was (and still is) a wildcard record that pointed everything to "S3 US Standard" but S3 would create a CNAME for your bucket that overrode the wildcard and pointed to the correct region, automatically, a few minutes after bucket creation. Until then, S3 would return a temporary HTTP redirect. This, obviously enough, requires a global bucket namespace. It still works for all but the newest regions.
But why did they do it that way? After all, at around the same time S3 also introduced endpoints in the style ${bucket}.s3-${region}.amazonaws.com ¹ that are actually wildcard DNS records: *.s3-${region}.amazonaws.com routes directly to the regional S3 endpoint for each S3 region, and is a responsive (but unusable) endpoint, even for nonexistent buckets. If you create a bucket in us-east-2 and send a request for that bucket to the eu-west-1 endpoint, S3 in eu-west-1 will throw an error, telling you that you need to send the request to us-east-2.
Also, around this time, they quietly dropped the whole east/west replication thing, and later renamed US Standard to what it really was at that point -- us-east-1. (Buttressing the "backwards compatibility" argument, s3-external-1 and s3-external-2 are still valid endpoints, but they both point to precisely the same place, in us-east-1.)
So why did the bucket namespace remain global? The only truly correct answer an outsider can give is "because that's what the decided to do."
But perhaps one factor was that AWS wanted to preserve compatibility with existing software that used ${bucket}.s3.amazonaws.com so that customers could deploy buckets in other regions without code changes. In the old days of Signature Version 2 (and earlier), the code that signed requests did not need to know the API endpoint region. Signature Version 4 requires knowledge of the endpoint region in order to generate a valid signature because the signing key is derived against the date, region, and service... but previously it wasn't like that, so you could just drop in a bucket name and client code needed no regional awareness -- or even awareness that S3 even had regions -- in order to work with a bucket in any region.
AWS is well-known for its practice of preserving backwards compatibility. They do this so consistently that occasionally some embarrassing design errors creep in and remain unfixed because to fix them would break running code.²
Another issue is virtual hosting of buckets. Back before HTTPS was accepted as non-optional, it was common to host ststic content by pointing your CNAME to the S3 endpoint. If you pointed www.example.com to S3, it would serve the content from a bucket with the exact name www.example.com. You can still do this, but it isn't useful any more since it doesn't support HTTPS. To host static S3 content with HTTPS, you use CloudFront in front of the bucket. Since CloudFront rewrites the Host header, the bucket name can be anything. You might be asking why you couldn't just point the www.example.com CNAME to the endpoint hostname of your bucket, but HTTP and DNS operate at very different layers and it simply doesn't work that way. (If you doubt this assertion, try pointing a CNAME from a domain that you control to www.google.com. You will not find that your domain serves the Google home page; instead, you'll be greeted with an error because the Google server will only see that it's received a request for www.example.com, and be oblivious to the fact that there was an intermediate CNAME pointing to it.) Virtual hosting of buckets requires either a global bucket namespace (so the Host header exactly matches the bucket) or an entirely separate mapping database of hostnames to bucket names... and why do that when you already have an established global namespace of buckets?
¹ Note that the - after s3 in these endpoints was eventually replaced by a much more logical . but these old endpoints still work.
² two examples that come to mind: (1) S3's incorrect omission of the Vary: Origin response header when a non-CORS request arrives at a CORS-enabled bucket (I have argued without success that this can be fixed without breaking anything, to no avail); (2) S3's blatantly incorrect handling of the symbol + in an object key, on the API, where the service interprets + as meaning %20 (space) so if you want a browser to download from a link to /foo+bar you have to upload it as /foo{space}bar.
You create an S3 bucket in a specific region only and objects stored in a bucket is only stored in that region itself. The data is neither replicated nor stored in different regions, unless you setup replication on a per bucket basis.
However. AWS S3 shares a global name space with all accounts. The name given to an S3 bucket should be unique
This requirement is designed to support globally unique DNS names for each bucket eg. http://bucketname.s3.amazonaws.com
I have been trying to reason why an S3 bucket name has to be globally unique. I came across the stackoverflow answer as well that says in order to resolve host header, bucket name got to be unique. However, my point is can't AWS direct the s3-region.amazonaws.com to region specific web server that can serve the bucket object from that region? That way the name could be globally unique only for a region. Meaning, the same bucket could be created in a different region. Please let me know if my understanding is completely wrong on how name resolution works or otherwise?
There is not, strictly speaking, a technical reason why the bucket namespace absolutely had to be global. In fact, it technically isn't quite as global as most people might assume, because S3 has three distinct partitions that are completely isolated from each other and do not share the same global bucket namespace across partition boundaries -- the partitions are aws (the global collection of regions most people know as "AWS"), aws-us-gov (US GovCloud), and aws-cn (the Beijing and Ningxia isolated regions).
So things could have been designed differently, with each region independent, but that is irrelevant now, because the global namespace is entrenched.
But why?
The specific reasons for the global namespace aren't publicly stated, but almost certainly have to do with the evolution of the service, backwards compatibility, and ease of adoption of new regions.
S3 is one of the oldest of the AWS services, older than even EC2. They almost certainly did not foresee how large it would become.
Originally, the namespace was global of necessity because there weren't multiple regions. S3 had a single logical region (called "US Standard" for a long time) that was in fact comprised of at least two physical regions, in or near us-east-1 and us-west-2. You didn't know or care which physical region each upload went to, because they replicated back and forth, transparently, and latency-based DNS resolution automatically gave you the endpoint with the lowest latency. Many users never knew this detail.
You could even explicitly override the automatic geo-routing of DNS amd upload to the east using the s3-external-1.amazonaws.com endpoint or to the west using the s3-external-2.amazonaws.com endpoint, but your object would shortly be accessible from either endpoint.
Up until this point, S3 did not offer immediate read-after-write consistency on new objects since that would be impractical in the primary/primary, circular replication environment that existed in earlier days.
Eventually, S3 launched in other AWS regions as they came online, but they designed it so that a bucket in any region could be accessed as ${bucket}.s3.amazonaws.com.
This used DNS to route the request to the correct region, based on the bucket name in the hostname, and S3 maintained the DNS mappings. *.s3.amazonaws.com was (and still is) a wildcard record that pointed everything to "S3 US Standard" but S3 would create a CNAME for your bucket that overrode the wildcard and pointed to the correct region, automatically, a few minutes after bucket creation. Until then, S3 would return a temporary HTTP redirect. This, obviously enough, requires a global bucket namespace. It still works for all but the newest regions.
But why did they do it that way? After all, at around the same time S3 also introduced endpoints in the style ${bucket}.s3-${region}.amazonaws.com ¹ that are actually wildcard DNS records: *.s3-${region}.amazonaws.com routes directly to the regional S3 endpoint for each S3 region, and is a responsive (but unusable) endpoint, even for nonexistent buckets. If you create a bucket in us-east-2 and send a request for that bucket to the eu-west-1 endpoint, S3 in eu-west-1 will throw an error, telling you that you need to send the request to us-east-2.
Also, around this time, they quietly dropped the whole east/west replication thing, and later renamed US Standard to what it really was at that point -- us-east-1. (Buttressing the "backwards compatibility" argument, s3-external-1 and s3-external-2 are still valid endpoints, but they both point to precisely the same place, in us-east-1.)
So why did the bucket namespace remain global? The only truly correct answer an outsider can give is "because that's what the decided to do."
But perhaps one factor was that AWS wanted to preserve compatibility with existing software that used ${bucket}.s3.amazonaws.com so that customers could deploy buckets in other regions without code changes. In the old days of Signature Version 2 (and earlier), the code that signed requests did not need to know the API endpoint region. Signature Version 4 requires knowledge of the endpoint region in order to generate a valid signature because the signing key is derived against the date, region, and service... but previously it wasn't like that, so you could just drop in a bucket name and client code needed no regional awareness -- or even awareness that S3 even had regions -- in order to work with a bucket in any region.
AWS is well-known for its practice of preserving backwards compatibility. They do this so consistently that occasionally some embarrassing design errors creep in and remain unfixed because to fix them would break running code.²
Another issue is virtual hosting of buckets. Back before HTTPS was accepted as non-optional, it was common to host ststic content by pointing your CNAME to the S3 endpoint. If you pointed www.example.com to S3, it would serve the content from a bucket with the exact name www.example.com. You can still do this, but it isn't useful any more since it doesn't support HTTPS. To host static S3 content with HTTPS, you use CloudFront in front of the bucket. Since CloudFront rewrites the Host header, the bucket name can be anything. You might be asking why you couldn't just point the www.example.com CNAME to the endpoint hostname of your bucket, but HTTP and DNS operate at very different layers and it simply doesn't work that way. (If you doubt this assertion, try pointing a CNAME from a domain that you control to www.google.com. You will not find that your domain serves the Google home page; instead, you'll be greeted with an error because the Google server will only see that it's received a request for www.example.com, and be oblivious to the fact that there was an intermediate CNAME pointing to it.) Virtual hosting of buckets requires either a global bucket namespace (so the Host header exactly matches the bucket) or an entirely separate mapping database of hostnames to bucket names... and why do that when you already have an established global namespace of buckets?
¹ Note that the - after s3 in these endpoints was eventually replaced by a much more logical . but these old endpoints still work.
² two examples that come to mind: (1) S3's incorrect omission of the Vary: Origin response header when a non-CORS request arrives at a CORS-enabled bucket (I have argued without success that this can be fixed without breaking anything, to no avail); (2) S3's blatantly incorrect handling of the symbol + in an object key, on the API, where the service interprets + as meaning %20 (space) so if you want a browser to download from a link to /foo+bar you have to upload it as /foo{space}bar.
You create an S3 bucket in a specific region only and objects stored in a bucket is only stored in that region itself. The data is neither replicated nor stored in different regions, unless you setup replication on a per bucket basis.
However. AWS S3 shares a global name space with all accounts. The name given to an S3 bucket should be unique
This requirement is designed to support globally unique DNS names for each bucket eg. http://bucketname.s3.amazonaws.com
I found out Amazon S3 give me different IP address when I tried to access the same resources. 13.111.11.11 and 13.222.11.11. Do they point to the same server location or do they point to different server location. If a file get updated, does accessing it on the two different IP make a difference in terms if we want to get it first?
You should always access Amazon S3 by using the provided DNS name (eg my-bucket.s3-us-west-2.amazonaws.com).
There are many, many servers powering Amazon S3 so you should never cache nor use a particular IP address. Also, if you are using a DNS name that resolves to a particular bucket (like the example above), S3 requires the domain name to know which bucket to access (since an IP address alone is insufficient).
The Amazon S3 Data Consistency Model says:
Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions ... Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all regions.
So, new files will always return the file, but updates might be subject to a short delay as the change is propagated between servers.
The application is using carrierwave in conjunction with the carrierwave-aws gem. It hit snags when migrating rails versions (bumped up to 4.2), ruby versions (2.2.3) and redeployed to the same staging server.
The AWS bucket was initially created in the free tier, thus Oregon, us-west-2. However, I have found that all the S3 files have the property which links to eu-west-1. Admittedly I've been tinkering around and considered using the eu-west-1 region. However I do not recall making any config changes - not even sure it is allowed in the free tier...
So yes, I've had to configure my uploads initializer with:
config.asset_host = 'https://s3-eu-west-1.amazonaws.com/theapp'
config.aws_credentials = {
region: 'eu-west-1'
}
Now the console for AWS is accessible with a URL that includes region=us-west-2
I do not understand how this got to be and am looking for suggestions.
In spite of appearances, and AWS account doesn't have a "home" (native) region.
The console defaults to us-west-2 (Oregon), and conventional wisdom suggests that this is a region where AWS has the most available/spare resources, lower operational costs, lower pricing for customers, and fewest constraints for growth, so that in the event a user does not have enough information at hand to actively select a region where they deploy services, Oregon will be used by default.
But for each account, no particular region has any special standing. If you switch regions in the console, the console will tend to open up to the same region next time.
Most AWS services -- EC2, SQS, SNS, RDS (to name a few) are strictly regional: the regions are independent and not connected together¹, in the interest of reliability and survivability. When you're in a given region in the console, you can only see EC2 resources in that region, SQS queues in that region, SNS topics in that region, etc. To see your resources in other regions, you switch regions in the console.
When making API requests to these services, you're using an endpoint in the region and your credentials also include the region.
Other services are global, with centralized administration -- examples here are CloudFront, IAM, and Route 53 hosted zones. When you make requests to these services, you always use the region "us-east-1" because that's the home location of those services' central, global administration. These tend to be services were a partitioning event (where one part of the global network is isolated from another). Administrative changes are replicated out around the world, but after the provisioning is replicated, the regional installations can operate autonomously without major service impacts. When you select these services in the console, you'll note that the region changes to "Global."
S3 is a hybrid that is different from essentially all of the others. When you select S3 in the console, you'll notice that the console region also changes to show "Global" and you can see all of your buckets, like other global services. S3 has independently operating regions, but a global namespace. The regions are logically connected and can communicate administrative messages among themselves and can transfer data across regions (but only when you do this deliberately -- otherwise, data remains in the region where you stored it).
Unlike the other global services, S3 does not have a single global endpoint that can handle every possible request.
Each time you create a bucket, you choose the region where you want to bucket to live. Subsequent requests related to that bucket have to be submitted to the bucket's region, and must have authorization credentials for the correct region.
If you submit a request to another S3 region's endpoint for that bucket, you'll receive an error telling you the correct region for the bucket.
< HTTP/1.1 301 Moved Permanently
< x-amz-bucket-region: us-east-1
< Server: AmazonS3
<Error>
<Code>PermanentRedirect</Code>
<Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message>
<Bucket>...</Bucket>
<Endpoint>s3.amazonaws.com</Endpoint>
<RequestId>...</RequestId>
<HostId>...</HostId>
</Error>
Conversely, if you send an S3 request to the correct endpoint but using the wrong region in your authentication credentials, you'll receive a different error for similar reasons:
< HTTP/1.1 400 Bad Request
< x-amz-bucket-region: us-west-2
< Server: AmazonS3
<
<?xml version="1.0" encoding="UTF-8"?>
<Error>
<Code>AuthorizationQueryParametersError</Code>
<Message>Error parsing the X-Amz-Credential parameter; the region 'eu-west-1' is wrong; expecting 'us-west-2'</Message>
<Region>us-west-2</Region>
<RequestId>...</RequestId>
<HostId>...</HostId>
</Error>
Again, this region is the region where you created the bucket, or the default "US Standard" (us-east-1). Once a bucket has been created, it can't be moved to a different region. The only way to "move" a bucket to a different region without the name changing is to remove all the files from the bucket, delete the bucket (you can't delete a non-empty bucket), wait a few minutes, and create the bucket in the new region. During the few minutes that S3 requires before the name is globally available after deleting a bucket, it's always possible that somebody else could take the bucket name for themselves... so choose your bucket region carefully.
S3 API interactions were edited and reformatted for clarity; some unrelated headers and other content was removed.
¹ not connected together seems (at first glance) to be contradicted in the sense that -- for example -- you can subscribe an SQS queue in one region to an SNS topic in another, you can replicate RDS from one region to another, and you can transfer EBS snapshots and AMIs from one region to another... but these back-channels are not under discussion, here. The control planes of the services in each region are isolated and independent. A problem with the RDS infrastructure in one region might disrupt replication to RDS in another region, but would not impact RDS operations in the other region. An SNS outage in one region would not impact SNS in another. The systems and services sometimes have cross-region communication capability for handling customer requested services, but each region's core operations for these regional services is independent.
Using S3 cross-region replication, if a user downloads http://mybucket.s3.amazonaws.com/myobject , will it automatically download from the closest region like cloudfront? So no need to specify the region in the URL like http://mybucket.s3-[region].amazonaws.com/myobject ?
http://aws.amazon.com/about-aws/whats-new/2015/03/amazon-s3-introduces-cross-region-replication/
Bucket names are global, and cross-region replication involves copying it to a different bucket.
In other words, example.us-west-1 and example.us-east-1 is not valid, as there can only be one bucket named 'example'.
That's implied in the announcement post- Mr. Barr is using buckets named jbarr and jbarr-replication.
Using S3 cross-Region replication will put your object into two (or more) buckets in two different Regions.
If you want a single access point that will choose the closest available bucket then you want to use Multi-Region Access Points (MRAP)
MRAP makes use of Global Accelerator and puts bucket requests onto the AWS backbone at the closest edge location, which provides faster, more reliable connection to the actual bucket. Global Accelerator also chooses the closest available bucket. If a bucket is not available, it will serve the request from the other bucket providing automatic failover
You can also configure it in an active/passive configuration, always serving from one bucket until you initiate a failover
From the MRAP page on AWS console it even shows you a graphical representation of your replication rules
s3 is global service, no need specify the region. The S3 name has to be unique globally.
when you create s3, you need specify the region, however it doesn't mean you need put the region name when you access it. To speed up the access speed from other region, there are several options like
-- Amazon S3 Transfer Acceleration with same bucket name.
-- Or use set up another bucket with different name in different region and enable cross region replication. Create an origin group with two origins for cloudfront.