How does Lifecycle policy for moving s3 to glacier work?

How does Lifecycle policy for moving s3 to glacier work? - amazon-web-services

I have created a lifecycle policy for one of my buckets as below:
Name and scope
Name MoveToGlacierAndDeleteAfterSixMonths
Scope Whole bucket
Transitions
For previous versions of objects Transition to Amazon Glacier after 1 days
Expiration Permanently delete after 360 days
Clean up incomplete multipart uploads after 7 days
I would like to get answer for the following questions:
When would the data be deleted from s3 as per this policy ?
Do i have to do anything on the glacier end inorder to move my s3 bucket to glacier ?
My s3 bucket is 6 years old and all the versions of the bucket are even older. But i am not able to see any data in the glacier console though my transition policy is set to move to glacier after 1 day from the creation of the data. Please explain this behavior.
Does this policy affect only new files which will be added to the bucket post lifepolicy creation or does this affect all the files in s3 bucket ?
Please answer these questions.

When would the data be deleted from s3 as per this policy ?
Never, for current versions. A lifecycle policy to transition objects to Glacier doesn't delete the data from S3 -- it migrates it out of S3 primary storage and over into Glacier storage -- but it technically remains an S3 object.
Think of it as S3 having its own Glacier account and storing data in that separate account on your behalf. You will not see these objects in the Glacier console -- they will remain in the S3 console, but if you examine an object that has transitioned, is storage class will change from whatever it was, e.g. STANDARD and will instead say GLACIER.
Do i have to do anything on the glacier end inorder to move my s3 bucket to glacier ?
No, you don't. As mentioned above, it isn't "your" Glacier account that will store the objects. On your AWS bill, the charges will appear under S3, but labeled as Glacier, and the price will be the same as the published pricing for Glacier.
My s3 bucket is 6 years old and all the versions of the bucket are even older. But i am not able to see any data in the glacier console though my transition policy is set to move to glacier after 1 day from the creation of the data. Please explain this behavior.
Two parts: first, check the object storage class displayed in the console or with aws s3api list-objects --output=text. See if you don't see some GLACIER-class objects. Second, it's a background process. It won't happen immediately but you should see things changing within 24 to 48 hours of creating the policy. If you have logging enabled on your bucket, I believe the transition events will also be logged.
Does this policy affect only new files which will be added to the bucket post lifepolicy creation or does this affect all the files in s3 bucket ?
This affects all objects in the bucket.

Related

How do I only transition objects greater than 100MB to AWS Glacier from S3 Standard using AWS Lifecycle Management Policies?

I have 50TB of data in an S3 Standard bucket.
I want to transition objects that are greater than 100MB & older than 30 days to AWS Glacier using an S3 Lifecycle Policy.
How can I only transition objects that are greater than 100MB in size?

There is no way to transition items based on file size.
As the name suggests, S3 Lifecycle policies allow you to specify transition actions based on object lifetime - not file size - to move items from the S3 Standard storage class to S3 Glacier.
Now, a really inefficient & costly way that may be suggested would be to schedule a Lambda to check the S3 bucket daily, see if anything is 30 days old & then "move" items to Glacier.
However, the Glacier API does not allow you to move items from S3 Standard to Glacier unless it is through a lifecycle policy.
This means you will need to download the S3 object and then re-upload the item again to Glacier.
I would still advise having a Lambda running daily to check the file size of items, however, create another folder (key) called archive for example. If there are any items older than 30 days & greater than 100MB, copy the item from the current folder to the archive folder and then delete the original item.
Set a 0-day life-cycle policy, filtered on the prefix of the other folder (archive), which then transitions the items to Glacier ASAP.
This way, you will be able to transfer items larger than 100MB after 30 days, without paying higher per-request charges associated with uploading items to Glacier, which may even cost you more than the savings you were aiming for in the first place.
To later transition the object(s) back from Glacier to S3 Standard, use the RestoreObject API (or SDK equivalent) to restore it back into the original folder. Then finally, delete the object from Glacier using a DELETE request to the archive URL.

create a lambda that runs every day (cron job) that checks for files older than 30 days and greater then 100mb in the bucket. You can use the s3 api and glacier api.

In the "Lifecycle rule configuration" there is (from Nov 23, 2021 - see References 1.) a "Object size" form field on which you can specify both the minimum and the maximum object size.
For the sake of completeness, by default Amazon S3 does not transition objects that are smaller than 128 KB for the following transitions:
From the S3 Standard or S3 Standard-IA storage classes to S3 Intelligent-Tiering or S3 Glacier Instant Retrieval.
From the S3 Standard storage class to S3 Standard-IA or S3 One Zone-IA
References:
https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-s3-lifecycle-storage-cost-savings/
https://docs.aws.amazon.com/AmazonS3/latest/userguide/lifecycle-transition-general-considerations.html

aws s3, why doesn't it list all the objects in the bucket?

I have an s3 bucket with around 80 objects which I can confirm from Cloudwatch metric. No prefixes/folders. All objects are in root path of the bucket.
When I do aws s3 ls bucket it is only showing current month's objects but not all and not previous month's objects. Even in AWS S3 console it is the same. I even tried aws s3 ls bucket --recursive
In the console, I see "viewing 1 of 24", but there are no buttons to navigate to see older objects.
Why is that? How can I see all objects in my bucket?
My s3 bucket storage class is standard.

Cloudwatch metric for NumberOfObjects shows current and noncurrent objects and the total number of parts for all incomplete multipart uploads to the bucket.
You probably have Versioning enabled on the bucket and "s3 ls" only list current versions, this command doesn't return non current versions of an object. You can click on "Show all" in S3 console to see versioned objects or list-object-versions to get the total number of objects.
Reference:
Get all versions of an object in an AWS S3 bucket?
https://www.npmjs.com/package/s3-list-all-objects

You say that you believe that the number from the Cloudwatch metric, NumberOfObjects. Here is the definition of it from S3 docs
The total number of objects stored in a bucket for all storage classes
except for the GLACIER storage class. This value is calculated by
counting all objects in the bucket (both current and noncurrent
objects) and the total number of parts for all incomplete multipart
uploads to the bucket.
So the discrepancy between what you are viewing in the console and the metric is probably that you have versioning on and there are old ("noncurrent") versions that are being counted
There are instructions for seeing the old versions here

Amazon Glacier How to delete files after a certain period of time

Thanks for reading this.
I am able to transfer files from S3 to Glacier after 30 days using lifecycle rule. However, how do I make the same files get deleted from Glacier after 3 months?
Thanks.

If the objects were moved from S3 to Glacier via a Lifecycle Policy, add a permanently delete setting to the lifecycle policy to Delete the objects after n days. This will delete the objects from both S3 and Glacier.
If, instead, the objects were uploaded directly to Glacier, then there is no auto-deletion capability.

As far as I'm aware, Glacier does not currently have lifecycle policies for Glacier vaults like it does for S3.
You could create your own autodelete setup (likely within the not-expiring-after-12-months AWS Free Tier) by writing metadata about the Glacier archives to DynamoDB (vault name, archive id, timestamp) and have a scheduled Lambda function that looks for archives older than 30 days and deletes them from Glacier and DynamoDB.
It's a bit of work to set up, but it would accomplish what you're trying to do.

Amazon AWS Athena S3 and Glacier Mixed Bucket

Amazon Athena Log Analysis Services with S3 Glacier
We have petabytes of data in S3. We are https://www.pubnub.com/ and we store usage data in S3 of our network for billing purposes. We have tab delimited log files stored in an S3 bucket. Athena is giving us a HIVE_CURSOR_ERROR failure.
Our S3 bucket is setup to automatically push to AWS Glacier after 6 months. Our bucket has S3 files hot and ready to read in addition to the Glacier backup files. We are getting access errors from Athena because of this. The file referenced in the error is a Glacier backup.
My guess is the answer will be: don't keep glacier backups in the same bucket. We don't have this option with ease due to our data volume sizes. I believe Athena will not work in this setup and we will not be able to use Athena for our log analysis.
However if there is a way we can use Athena, we would be thrilled. Is there a solution to HIVE_CURSOR_ERROR and a way to skip Glacier files? Our s3 bucket is a flat bucket without folders.
The S3 file object name shown in the above and below screenshots is omitted from the screenshot. The file reference in the HIVE_CURSOR_ERROR is in fact the Glacier object. You can see it in this screenshot of our S3 Bucket.
Note I tried to post on https://forums.aws.amazon.com/ but that was no bueno.

The documentation from AWS dated May 16 2017 states specifically that Athena does not support the GLACIER storage class:
Athena does not support different storage classes within the bucket specified by the LOCATION
clause, does not support the GLACIER storage class, and does not support Requester Pays
buckets. For more information, see Storage Classes, Changing the Storage Class of an Object in
|S3|, and Requester Pays Buckets in the Amazon Simple Storage Service Developer Guide.
We are also interested in this; if you get it to work, please let us know how. :-)

Since the release of February 18, 2019 Athena will ignore objects with the GLACIER storage class instead of failing the query:
[…] As a result of fixing this issue, Athena ignores objects transitioned to the GLACIER storage class. Athena does not support querying data from the GLACIER storage class.

You must have an S3 bucket to work with. In addition, the AWS account that you use to initiate a S3 Glacier Select job must have write permissions for the S3 bucket. The Amazon S3 bucket must be in the same AWS Region as the vault that contains the archive object that is being queried.
S3 glacier select runs the query and stores in S3 bucket
Bottom line, you must move the data into an S3 buck to use teh S3 glacier select statement. Then use Athena on the 'new' S3 bucket.

Lifecycle policy on S3 bucket

I have an S3 bucket on which I've configured a Lifecycle policy which says to archive all objects in the bucket after 1 day(s) (since I want to keep the files in there temporarily but if there are no issues then it is fine to archive them and not have to pay for the S3 storage)
However I have noticed there are some files in that bucket that were created in February ..
So .. am I right in thinking that if you select 'Archive' as the lifecycle option, that means "copy-to-glacier-and-then-delete-from-S3"? In which case this issue of the files left from February would be a fault - since they haven't been?
Only I saw there is another option - 'Archive and then Delete' - but I assume that means "copy-to-glacier-and-then-delete-from-glacier" - which I don't want.
Has anyone else had issues with S3 -> Glacier?

What you describe sounds normal. Check the storage class of the objects.
The correct way to understand the S3/Glacier integration is the S3 is the "customer" of Glacier -- not you -- and Glacier is a back-end storage provider for S3. Your relationship is still with S3 (if you go into Glacier in the console, your stuff isn't visible there, if S3 put it in Glacier).
When S3 archives an object to Glacier, the object is still logically "in" the bucket and is still an S3 object, and visible in the S3 console, but can't be downloaded from S3 because S3 has migrated it to a different backing store.
The difference you should see in the console is that objects will have A "storage class" of Glacier instead of the usual Standard or Reduced Redundancy. They don't disappear from there.
To access the object later, you ask S3 to initiate a restore from Glacier, which S3 does... but the object is still in Glacier at that point, with S3 holding a temporary copy, which it will again purge after some number of days.
Note that your attempt at saving may be a little bit off target if you do not intend to keep these files for 3 months, because any time you delete an object from Glacier, you are billed for the remainder of the three months, if that object has been in Glacier for a shorter time than that.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js