Amazon announced on April 2018 the availability of a new storage class, named One-Zone Infrequent Access, that complements the plain IA option by lowering the cost via the usage of only one AZ for storage.
Its site, advertises that all storage classes have the so-called 11 9s durability.
My question is how two options (IA vs One Zone IA) can have the same (i.e. 11 9s) durability, given that the later uses 1 AZ (vs multiple of the former)
Amazon S3 Standard, S3 Standard-IA, S3 One Zone-IA, and Amazon
Glacier, are all designed for 99.999999999% durability. Amazon S3
Standard, S3 Standard-IA and Amazon Glacier distribute data across a
minimum of three geographically-separated Availability Zones to offer
the highest level of resilience to AZ loss. S3 One Zone-IA saves cost
by storing infrequently accessed data with lower resilience in a
single Availability Zone. Amazon S3 Standard-IA is a good choice for
long-term storage of master data that is infrequently accessed. For
other infrequently accessed data, such as duplicates of backups or
data summaries that can be regenerated, S3 One Zone-IA provides a
lower price point
ps. I think they are themselves implying that one zone has lower durability.
One Zone-IA saves cost by storing infrequently accessed data with lower resilience in a single Availability Zone
S3 One Zone IA is just as durable from an engineering standpoint as the other storage classes except in the event that the availability zone where your data is stored is destroyed.
S3 One Zone-IA offers the same high durability†, high throughput, and low latency of Amazon S3 Standard and S3 Standard-IA
...
† Because S3 One Zone-IA stores data in a single AWS Availability Zone, data stored in this storage class will be lost in the event of Availability Zone destruction.
https://aws.amazon.com/s3/storage-classes/
So, the claim being made appears to be that 1ZIA is using the same engineering design as other storage classes -- redundant storage media -- except that everything is physically in a single AZ rather than distributed across multiple zones... so it offers comparable durability... except for the case of a catastrophic event involving that AZ.
It seems that having one AZ instead of 2, does reduce the availability:
Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)
Same low latency and high throughput performance of S3 Standard
Designed for durability of 99.999999999% of objects in a single
Availability Zone†
Designed for 99.5% availability over a given year
Backed with the Amazon S3 Service Level Agreement for availability
Supports SSL for data in transit and encryption of data at rest S3
Lifecycle management for automatic migration of objects to other S3
Storage Classes
Related
I want to configure my storage system to move inactive files into the cold storage, AKA S3 Glacier.
The rule I'm looking for is "if this file was not downloaded in the last 90 days, send it to S3 Glacier".
Lifecycle rules don't seem to work for that purpose as they don't take into account if the object is being used.
Any ideas?
This sounds like an ideal use for the S3 Intelligent-Tiering storage class.
From Using Amazon S3 storage classes - Amazon Simple Storage Service:
S3 Intelligent-Tiering is an Amazon S3 storage class designed to optimize storage costs by automatically moving data to the most cost-effective access tier, without operational overhead. It delivers automatic cost savings by moving data on a granular object level between access tiers when access patterns change. S3 Intelligent-Tiering is the perfect storage class when you want to optimize storage costs for data that has unknown or changing access patterns. There are no retrieval fees for S3 Intelligent-Tiering.
For a small monthly object monitoring and automation fee, S3 Intelligent-Tiering monitors the access patterns and moves the objects automatically from one tier to another. It works by storing objects in four access tiers: two low latency access tiers optimized for frequent and infrequent access, and two opt-in archive access tiers designed for asynchronous access that are optimized for rare access.
Objects that are uploaded or transitioned to S3 Intelligent-Tiering are automatically stored in the Frequent Access tier. S3 Intelligent-Tiering works by monitoring access patterns and then moving the objects that have not been accessed in 30 consecutive days to the Infrequent Access tier. After you activate one or both of the archive access tiers, S3 Intelligent-Tiering automatically moves objects that haven’t been accessed for 90 consecutive days to the Archive Access tier, and after 180 consecutive days of no access, to the Deep Archive Access tier.
In order to access archived objects later, you first need to restore them.
Note: The S3 Intelligent-Tiering storage class is suitable for objects larger than 128 KB that you plan to store for at least 30 days. If the size of an object is less than 128 KB, it is not eligible for auto-tiering. Smaller objects can be stored, but they are always charged at the frequent access tier rates in the S3 Intelligent-Tiering storage class. If you delete an object before the end of the 30-day minimum storage duration period, you are charged for 30 days. For pricing information, see Amazon S3 pricing.
See also: Announcing S3 Intelligent-Tiering — a New Amazon S3 Storage Class
I am in a position where I have a static site hosted in S3 that I need to front with CloudFront. In other words I have no option but to put CloudFront in front of it. I would like to reduce my S3 costs by changing the objects storage class to S3 Infrequent Access (IA), this will reduce my S3 costs by like 45% which is nice since I have to now spend money on CloudFront. Is this a good practice to do? since the resources will be cached by CloudFront anyways? S3 IA has 99.9% uptime which means it can have as much as 8.75 hours of down time per year with AWS s3 IA.
First, don't worry about the downtime. Unless you are using Reduced Redundancy or One-Zone Storage, all data on S3 has pretty much the same redundancy and therefore very high availability.
S3 Standard-IA is pretty much half-price for storage ($0.0125 per GB) compared to S3 Standard ($0.023 per GB). However, data retrieval costs for Standard-IA is $0.01 per GB. Thus, if the data is retrieved more than once per month, then Standard-IA is more expensive.
While using Amazon CloudFront in front of S3 would reduce data access frequency, it's worth noting that CloudFront caches separately in each region. So, if users in Singapore, Sydney and Tokyo all requested the data, it would be fetched three times from S3. So, data stored as Standard-IA would incur 3 x $0.01 per GB charges, making it much more expensive.
See: Announcing Regional Edge Caches for Amazon CloudFront
Bottom line: If the data is going to be accessed at least once per month, it is cheaper to use Standard Storage instead of Standard-Infrequent Access.
Amazon S3 gaurantees that the data being uploaded to bucket will be spread across >= 3 AZs.scroll down for chart. When we create bucket, we enter region. How amazon manages this AZs number when we create bucket in the region where we have only two AZs?
here's the answer from AWS S3 FAQ. Apparently, in those cases, more AZs exist, but they are not publicly available:
Q: What is an AWS Availability Zone (AZ)?
An AWS Availability Zone is an isolated location within an AWS Region. Within each AWS Region, S3 operates in a minimum of three AZs, each separated by miles to protect against local events like fires, floods, etc.
Amazon S3 Standard, S3 Standard-Infrequent Access, and S3 Glacier storage classes replicate data across a minimum of three AZs to protect against the loss of one entire AZ. This remains true in Regions where fewer than three AZs are publicly available. Objects stored in these storage classes are available for access from all of the AZs in an AWS Region.
The Amazon S3 One Zone-IA storage class replicates data within a single AZ. Data stored in this storage class is susceptible to loss in an AZ destruction event.
I've seen many environments where critical data is backed up to Amazon S3 and it is assumed that this will basically never fail.
I know that Amazon reports that data stored in S3 has 99.999999999% durability (11 9's), but one thing that I'm struck by is the following passage from the AWS docs:
Amazon S3 provides a highly durable storage infrastructure designed
for mission-critical and primary data storage. Objects are redundantly
stored on multiple devices across multiple facilities in an Amazon S3
region.
So, S3 objects are only replicated within a single AWS region. Say there's an earthquake in N. California that decimates the whole region. Does that mean N. California S3 data has gone with it?
I'm curious what others consider best practices with respect to persisting mission-critical data in S3?
Given that S3 is 99.999999999% durability [1], what is the equivalent figure for DynamoDB?
[1] http://aws.amazon.com/s3/
This question implies something that is incorrect. Though S3 has an SLA (aws.amazon.com/s3-sla) that SLA references availability (99.9%) but has no reference to durability, or the loss of objects in S3.
The 99.999999999% durability figure comes from Amazon's estimate of what S3 is designed to achieve and there is no related SLA.
Note that Amazon S3 is designed for 99.99% availability but the SLA kicks in at 99.9%.
There is no current DynamoDB SLA from Amazon, nor am I aware of any published figures from Amazon on the expected or designed durability of data in DynamoDB. I would suspect that it is less than S3 given the nature, relative complexities, and goals of the two systems (i.e., S3 is designed to simply store data objects very, very safely; DynamoDB is designed to provide super-fast reads and writes in a scalable distributed database while also trying to keep your data safe).
Amazon talks about customers backing up DynamoDB to S3 using MapReduce. They also say that some customers back up DynamoDB using Redshift, which has DynamoDB compatibility built in. I additionally recommend backing up to an off-AWS store to remove the single point of failure that is your AWS account.
Although the DynamoDB FAQ doesn't use the exact same wording as you can see from my highlights below both DynamoDB & S3 are designed to be fault tolerant, with data stored in three facilities.
I wasn't able to find exact figures reported anywhere, but from the information I did have it looks like DynamoDB is pretty durable (on par with S3), although that won't stop it from having service interruptions from time to time. See this link:
http://www.forbes.com/sites/kellyclay/2013/02/20/amazons-aws-experiencing-problems-again/
S3 FAQ: http://aws.amazon.com/s3/faqs/#How_is_Amazon_S3_designed_to_achieve_99.999999999%_durability
Q: How durable is Amazon S3? Amazon S3 is designed to provide
99.999999999% durability of objects over a given year.
In addition, Amazon S3 is designed to sustain the concurrent loss of
data in two facilities.
Also Note: The "99.999999999%" figure for S3 is over a given year.
DynamoDB FAQ: http://aws.amazon.com/dynamodb/faqs/#Is_there_a_limit_to_how_much_data_I_can_store_in_Amazon_DynamoDB
Scale, Availability, and Durability
Q: How highly available is Amazon DynamoDB?
The service runs across Amazon’s proven, high-availability data
centers. The service replicates data across three facilities in an AWS
Region to provide fault tolerance in the event of a server failure or
Availability Zone outage.
Q: How does Amazon DynamoDB achieve high uptime and durability?
To achieve high uptime and durability, Amazon DynamoDB synchronously
replicates data across three facilities within an AWS Region.