I have an S3 bucket on which I've configured a Lifecycle policy which says to archive all objects in the bucket after 1 day(s) (since I want to keep the files in there temporarily but if there are no issues then it is fine to archive them and not have to pay for the S3 storage)
However I have noticed there are some files in that bucket that were created in February ..
So .. am I right in thinking that if you select 'Archive' as the lifecycle option, that means "copy-to-glacier-and-then-delete-from-S3"? In which case this issue of the files left from February would be a fault - since they haven't been?
Only I saw there is another option - 'Archive and then Delete' - but I assume that means "copy-to-glacier-and-then-delete-from-glacier" - which I don't want.
Has anyone else had issues with S3 -> Glacier?
What you describe sounds normal. Check the storage class of the objects.
The correct way to understand the S3/Glacier integration is the S3 is the "customer" of Glacier -- not you -- and Glacier is a back-end storage provider for S3. Your relationship is still with S3 (if you go into Glacier in the console, your stuff isn't visible there, if S3 put it in Glacier).
When S3 archives an object to Glacier, the object is still logically "in" the bucket and is still an S3 object, and visible in the S3 console, but can't be downloaded from S3 because S3 has migrated it to a different backing store.
The difference you should see in the console is that objects will have A "storage class" of Glacier instead of the usual Standard or Reduced Redundancy. They don't disappear from there.
To access the object later, you ask S3 to initiate a restore from Glacier, which S3 does... but the object is still in Glacier at that point, with S3 holding a temporary copy, which it will again purge after some number of days.
Note that your attempt at saving may be a little bit off target if you do not intend to keep these files for 3 months, because any time you delete an object from Glacier, you are billed for the remainder of the three months, if that object has been in Glacier for a shorter time than that.
Related
I have uploaded 365 files 1 files per day to S3 bucket all at one go. Now All the files have the same upload date. I want to Move the file which are more than 6 months to S3 Glacier. S3 lifecycle policy will take effect after 6 months as all the files upload date to s3 is same. The actual date of the files upload is stored in DynamoDb table with S3KeyUrl.
I want to know the best way to be able to move file to s3 Glacier. I came up with the following approach
Create the S3 Lifecycle policy to move file to s3 Glacier which will work after 6 month.
Create a app to Query DynamoDB Table to get the list of files which are more than 6 months and
download the file from s3 (as it allows uploading files from local directory) and use
ArchiveTransferManager (Amazon.Glacier.Transfer) to the file to s3 glacier vault.
In Prod Scenario there will be files in some 10 million so the solution should be reliable.
There are two versions of Glacier:
The 'original' Amazon Glacier, which uses Vaults and Archives
The Amazon S3 Storage Classes of Glacier and Glacier Deep Archive
Trust me... You do not want to use the 'original' Glacier. It is slow and difficult to use. So, avoid anything that mentions Vaults and Archives.
Instead, you simply want to change the Storage Class of the objects in Amazon S3.
Normally, the easiest way to do this is to "Edit storage class" in the S3 management console. However, you mention Millions of objects, so this wouldn't be feasible.
Instead, you will need to copy objects over themselves, while changing the storage class. This can be done with the AWS CLI:
aws s3 cp s3://<bucket-name>/ s3://<bucket-name>/ --recursive --storage-class <storage_class>
Note that this would change the storage class for all objects in the given bucket/path. Since you only wish to selectively change the storage class, you would either need to issue lots of the above commands (each for only one object), or you could use an AWS SDK to script the process. For example, you could write a Python program that loops through the list of objects, checks DynamoDB to determine whether the object is '6 months old' and then copies it over itself with the new Storage Class.
See: StackOverflow: How to change storage class of existing key via boto3
If you have millions of objects, it can take a long time to merely list the objects. Therefore, you could consider using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects. You could then use this CSV file as the 'input list' for your 'copy' operation rather than having to list the bucket itself.
Or, just be lazy (which is always more productive!) and archive everything to Glacier. Then, if somebody actually needs one of the files in the next 6 months, simply restore it from Glacier before use. So simple!
I have 50TB of data in an S3 Standard bucket.
I want to transition objects that are greater than 100MB & older than 30 days to AWS Glacier using an S3 Lifecycle Policy.
How can I only transition objects that are greater than 100MB in size?
There is no way to transition items based on file size.
As the name suggests, S3 Lifecycle policies allow you to specify transition actions based on object lifetime - not file size - to move items from the S3 Standard storage class to S3 Glacier.
Now, a really inefficient & costly way that may be suggested would be to schedule a Lambda to check the S3 bucket daily, see if anything is 30 days old & then "move" items to Glacier.
However, the Glacier API does not allow you to move items from S3 Standard to Glacier unless it is through a lifecycle policy.
This means you will need to download the S3 object and then re-upload the item again to Glacier.
I would still advise having a Lambda running daily to check the file size of items, however, create another folder (key) called archive for example. If there are any items older than 30 days & greater than 100MB, copy the item from the current folder to the archive folder and then delete the original item.
Set a 0-day life-cycle policy, filtered on the prefix of the other folder (archive), which then transitions the items to Glacier ASAP.
This way, you will be able to transfer items larger than 100MB after 30 days, without paying higher per-request charges associated with uploading items to Glacier, which may even cost you more than the savings you were aiming for in the first place.
To later transition the object(s) back from Glacier to S3 Standard, use the RestoreObject API (or SDK equivalent) to restore it back into the original folder. Then finally, delete the object from Glacier using a DELETE request to the archive URL.
create a lambda that runs every day (cron job) that checks for files older than 30 days and greater then 100mb in the bucket. You can use the s3 api and glacier api.
In the "Lifecycle rule configuration" there is (from Nov 23, 2021 - see References 1.) a "Object size" form field on which you can specify both the minimum and the maximum object size.
For the sake of completeness, by default Amazon S3 does not transition objects that are smaller than 128 KB for the following transitions:
From the S3 Standard or S3 Standard-IA storage classes to S3 Intelligent-Tiering or S3 Glacier Instant Retrieval.
From the S3 Standard storage class to S3 Standard-IA or S3 One Zone-IA
References:
https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-s3-lifecycle-storage-cost-savings/
https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html
I have created a lifecycle policy for one of my buckets as below:
Name and scope
Name MoveToGlacierAndDeleteAfterSixMonths
Scope Whole bucket
Transitions
For previous versions of objects Transition to Amazon Glacier after 1 days
Expiration Permanently delete after 360 days
Clean up incomplete multipart uploads after 7 days
I would like to get answer for the following questions:
When would the data be deleted from s3 as per this policy ?
Do i have to do anything on the glacier end inorder to move my s3 bucket to glacier ?
My s3 bucket is 6 years old and all the versions of the bucket are even older. But i am not able to see any data in the glacier console though my transition policy is set to move to glacier after 1 day from the creation of the data. Please explain this behavior.
Does this policy affect only new files which will be added to the bucket post lifepolicy creation or does this affect all the files in s3 bucket ?
Please answer these questions.
When would the data be deleted from s3 as per this policy ?
Never, for current versions. A lifecycle policy to transition objects to Glacier doesn't delete the data from S3 -- it migrates it out of S3 primary storage and over into Glacier storage -- but it technically remains an S3 object.
Think of it as S3 having its own Glacier account and storing data in that separate account on your behalf. You will not see these objects in the Glacier console -- they will remain in the S3 console, but if you examine an object that has transitioned, is storage class will change from whatever it was, e.g. STANDARD and will instead say GLACIER.
Do i have to do anything on the glacier end inorder to move my s3 bucket to glacier ?
No, you don't. As mentioned above, it isn't "your" Glacier account that will store the objects. On your AWS bill, the charges will appear under S3, but labeled as Glacier, and the price will be the same as the published pricing for Glacier.
My s3 bucket is 6 years old and all the versions of the bucket are even older. But i am not able to see any data in the glacier console though my transition policy is set to move to glacier after 1 day from the creation of the data. Please explain this behavior.
Two parts: first, check the object storage class displayed in the console or with aws s3api list-objects --output=text. See if you don't see some GLACIER-class objects. Second, it's a background process. It won't happen immediately but you should see things changing within 24 to 48 hours of creating the policy. If you have logging enabled on your bucket, I believe the transition events will also be logged.
Does this policy affect only new files which will be added to the bucket post lifepolicy creation or does this affect all the files in s3 bucket ?
This affects all objects in the bucket.
I have a website where I serve content that is stored on an AWS S3 bucket. As the amount of content grows, I have started thinking about back-up options. Using AWS Glacier came up as a natural route.
After reading on it, I didn't understand if it does what I intend to do with it. From what I have understood, using Glacier, you set lifecycle policies on objects stored on your S3 buckets. According to these policies, objects will be transferred Glacier and deleted from your S3 bucket at a specific point in time after they have been uploaded to S3. At this point, the object's storage class changes to 'GLACIER'. Amazon explains that, once this is done, you can no longer access the objects through S3 but "their index entry will remain as is". Simultaneously, they say that retrieval of objects from Glacier takes 3-5 hours.
My question is: Does this mean that, once objects are transferred to Glacier, I will not be able to serve them on my website without retrieving them first? Or does it mean that they will still be served from the S3 bucket as usual but that, in case something happens with the files on S3 I will just be able to retrieve them in 3-5 hours? Glacier would only be a viable back up solution for me if users of my website would still be able to load content on the website after the correspondent objects are transferred to Glacier. Also, is it possible to have objects transferred to Glacier without them being deleted from the S3 bucket?
Thank you
To answer your question: Does this mean that, once objects are transferred to Glacier, I will not be able to serve them on my website without retrieving them first?
No, you won't be able to serve them on your website unless transfer them from glacier to standard or standard_IA class, which is taken 3-5 hours. Glacier is generally used to archive cold data like old logs which is accessed in rare condition. So if you need real-time access to the object, Glacier isn't a valid option for you.
A while ago when the price difference between standard storage and glacier was closer to 10:1 than 3:1, I moved a couple of bucket completely to Glacier using a life-cycle policy. Admittedly, I hadn't investigated how to reverse that process permanently.
I know the documentation states that I would have to "use the copy operation to overwrite the object as a Standard or RRS object", but I guess I'm unclear what that looks like. Do I just copy and paste within that bucket?
Restoration can only be done at the Object level, not Bucket or Folder. When an object is restored, it is a available in Amazon S3 for the period requested (eg 2 days), then reverts to only being in Glacier. You'll need to copy the objects out of the bucket during that time to keep a copy of them in S3.
The easiest method would be to use a utility to restore multiple Objects for you, eg:
How to restore whole directories from Glacier?
Also, see Stackoverflow: How to restore folders (or entire buckets) to Amazon S3 from Glacier?