I want to configure my storage system to move inactive files into the cold storage, AKA S3 Glacier.
The rule I'm looking for is "if this file was not downloaded in the last 90 days, send it to S3 Glacier".
Lifecycle rules don't seem to work for that purpose as they don't take into account if the object is being used.
Any ideas?
This sounds like an ideal use for the S3 Intelligent-Tiering storage class.
From Using Amazon S3 storage classes - Amazon Simple Storage Service:
S3 Intelligent-Tiering is an Amazon S3 storage class designed to optimize storage costs by automatically moving data to the most cost-effective access tier, without operational overhead. It delivers automatic cost savings by moving data on a granular object level between access tiers when access patterns change. S3 Intelligent-Tiering is the perfect storage class when you want to optimize storage costs for data that has unknown or changing access patterns. There are no retrieval fees for S3 Intelligent-Tiering.
For a small monthly object monitoring and automation fee, S3 Intelligent-Tiering monitors the access patterns and moves the objects automatically from one tier to another. It works by storing objects in four access tiers: two low latency access tiers optimized for frequent and infrequent access, and two opt-in archive access tiers designed for asynchronous access that are optimized for rare access.
Objects that are uploaded or transitioned to S3 Intelligent-Tiering are automatically stored in the Frequent Access tier. S3 Intelligent-Tiering works by monitoring access patterns and then moving the objects that have not been accessed in 30 consecutive days to the Infrequent Access tier. After you activate one or both of the archive access tiers, S3 Intelligent-Tiering automatically moves objects that haven’t been accessed for 90 consecutive days to the Archive Access tier, and after 180 consecutive days of no access, to the Deep Archive Access tier.
In order to access archived objects later, you first need to restore them.
Note: The S3 Intelligent-Tiering storage class is suitable for objects larger than 128 KB that you plan to store for at least 30 days. If the size of an object is less than 128 KB, it is not eligible for auto-tiering. Smaller objects can be stored, but they are always charged at the frequent access tier rates in the S3 Intelligent-Tiering storage class. If you delete an object before the end of the 30-day minimum storage duration period, you are charged for 30 days. For pricing information, see Amazon S3 pricing.
See also: Announcing S3 Intelligent-Tiering — a New Amazon S3 Storage Class
Related
AWS S3 has a standard public bucket and folder (Asia Pacific region) which hosts ~30 GB of images/media. On another hand, the website and app access these images by using a direct S3 object URL. Unknowingly we run into high data transfer cost and its significantly unproportionate:
Amazon Simple Storage Service: USD 30
AWS Data Transfer: USD 110
I have also read that if EC2 and S3 is in the same region cost will be significantly lower, but problem is S3 objects are accessible from anywhere in the world from client machine directly and no EC2 is involved in between.
Can someone please suggest how can data transfer costs be reduced?
The Data Transfer charge is directly related to the amount of information that goes from AWS to the Internet. Depending on your region, it is typically charged at 9c/GB.
If you are concerned about the Data Transfer charge, there are a few things you could do:
Activate Amazon S3 Server Access Logging, which will create a log file for each web request. You can then see how many requests are coming and possibly detect strange access behaviour (eg bots, search engines, abuse).
You could try reducing the size of files that are typically accessed, such as making images smaller. Take a look at the Access Logs and determine which objects are being accessed the most, and are therefore causing the most costs.
Use less large files on your website (eg videos). Again, look at the Access Logs to determine where the data is being used.
A cost of $110 suggests about 1.2TB of data being transferred.
I have created a s3 life cycle policy which will Expire the current version of the object in 540 days.
I am a bit confused here, whether it deletes the objects from s3 or glacier,
if not I want to delete the objects from a bucket in 540 days and the glacier in some 4 years! how will I set it up?
Expiring an object means "delete it", regardless of its storage class.
So, if it has moved to a Glacier storage class, it will still be deleted.
When you store data in Glacier via S3, then the object is managed by Amazon S3 (but stored in Glacier). Thus, the lifecycle rules apply.
If, however, you store data directly in Amazon Glacier (without going via Amazon S3), then the data would not be impacted by the lifecycle rules, nor would it be visible in Amazon S3.
Bottom line: Set your rules for deletion based upon the importance of the data, not its current storage class.
I am in a position where I have a static site hosted in S3 that I need to front with CloudFront. In other words I have no option but to put CloudFront in front of it. I would like to reduce my S3 costs by changing the objects storage class to S3 Infrequent Access (IA), this will reduce my S3 costs by like 45% which is nice since I have to now spend money on CloudFront. Is this a good practice to do? since the resources will be cached by CloudFront anyways? S3 IA has 99.9% uptime which means it can have as much as 8.75 hours of down time per year with AWS s3 IA.
First, don't worry about the downtime. Unless you are using Reduced Redundancy or One-Zone Storage, all data on S3 has pretty much the same redundancy and therefore very high availability.
S3 Standard-IA is pretty much half-price for storage ($0.0125 per GB) compared to S3 Standard ($0.023 per GB). However, data retrieval costs for Standard-IA is $0.01 per GB. Thus, if the data is retrieved more than once per month, then Standard-IA is more expensive.
While using Amazon CloudFront in front of S3 would reduce data access frequency, it's worth noting that CloudFront caches separately in each region. So, if users in Singapore, Sydney and Tokyo all requested the data, it would be fetched three times from S3. So, data stored as Standard-IA would incur 3 x $0.01 per GB charges, making it much more expensive.
See: Announcing Regional Edge Caches for Amazon CloudFront
Bottom line: If the data is going to be accessed at least once per month, it is cheaper to use Standard Storage instead of Standard-Infrequent Access.
Amazon announced on April 2018 the availability of a new storage class, named One-Zone Infrequent Access, that complements the plain IA option by lowering the cost via the usage of only one AZ for storage.
Its site, advertises that all storage classes have the so-called 11 9s durability.
My question is how two options (IA vs One Zone IA) can have the same (i.e. 11 9s) durability, given that the later uses 1 AZ (vs multiple of the former)
Amazon S3 Standard, S3 Standard-IA, S3 One Zone-IA, and Amazon
Glacier, are all designed for 99.999999999% durability. Amazon S3
Standard, S3 Standard-IA and Amazon Glacier distribute data across a
minimum of three geographically-separated Availability Zones to offer
the highest level of resilience to AZ loss. S3 One Zone-IA saves cost
by storing infrequently accessed data with lower resilience in a
single Availability Zone. Amazon S3 Standard-IA is a good choice for
long-term storage of master data that is infrequently accessed. For
other infrequently accessed data, such as duplicates of backups or
data summaries that can be regenerated, S3 One Zone-IA provides a
lower price point
ps. I think they are themselves implying that one zone has lower durability.
One Zone-IA saves cost by storing infrequently accessed data with lower resilience in a single Availability Zone
S3 One Zone IA is just as durable from an engineering standpoint as the other storage classes except in the event that the availability zone where your data is stored is destroyed.
S3 One Zone-IA offers the same high durability†, high throughput, and low latency of Amazon S3 Standard and S3 Standard-IA
...
† Because S3 One Zone-IA stores data in a single AWS Availability Zone, data stored in this storage class will be lost in the event of Availability Zone destruction.
https://aws.amazon.com/s3/storage-classes/
So, the claim being made appears to be that 1ZIA is using the same engineering design as other storage classes -- redundant storage media -- except that everything is physically in a single AZ rather than distributed across multiple zones... so it offers comparable durability... except for the case of a catastrophic event involving that AZ.
It seems that having one AZ instead of 2, does reduce the availability:
Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA)
Same low latency and high throughput performance of S3 Standard
Designed for durability of 99.999999999% of objects in a single
Availability Zone†
Designed for 99.5% availability over a given year
Backed with the Amazon S3 Service Level Agreement for availability
Supports SSL for data in transit and encryption of data at rest S3
Lifecycle management for automatic migration of objects to other S3
Storage Classes
Is there a way to set an expiry date in Amazon Glacier? I want to copy in weekly backup files, but I dont want to hang on to more than 1 years worth.
Can the files be set to "expire" after one year, or is this something I will have to do manually?
While not available natively within Amazon Glacier, AWS has recently enabled Archiving Amazon S3 Data to Amazon Glacier, which makes working with Glacier much easier in the first place already:
[...] Amazon S3 was designed for rapid retrieval. Glacier, in
contrast, trades off retrieval time for cost, providing storage for as
little at $0.01 per Gigabyte per month while retrieving data within
three to five hours.
How would you like to have the best of both worlds? How about rapid
retrieval of fresh data stored in S3, with automatic, policy-driven
archiving to lower cost Glacier storage as your data ages, along with
easy, API-driven or console-powered retrieval? [emphasis mine]
[...] You can now use Amazon Glacier as a storage option for Amazon S3.
This is enabled by facilitating Amazon S3 Object Lifecycle Management, which not only drives the mentioned Object Archival (Transition Objects to the Glacier Storage Class) but also includes optional Object Expiration, which allows you to achieve what you want as outlined in section Before You Decide to Expire Objects within Lifecycle Configuration Rules:
The Expiration action deletes objects
You might have objects in Amazon S3 or archived to Amazon Glacier. No
matter where these objects are, Amazon S3 will delete them. You will
no longer be able to access these objects. [emphasis mine]
So at the small price of having your objects stored in S3 for a short time (which actually eases working with Glacier a lot due to removing the need to manage archives/inventories) you gain the benefit of optional automatic expiration.
You can do this in the AWS Command Line Interface.
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html