I have a AWS S3 bucket with a 1 day expiration lifecycle rule. But the AWS didn’t delete the expired objects automatically. How can I let AWS delete the expired objects automatically? Are there still additional cost for these expired but unremoved objects?
No AWS will evaluate the expiration rules once in a day and will not be charged for that as said in below doc.
We evaluate the expiration rules once each day. During this time,
based on their expiration dates, any object found to be expired will
be queued for removal. You will not be billed for any associated
storage for those objects on or after their expiration date. If server
access logging has been enabled for that S3 bucket, an
S3.EXPIRE.OBJECT record will be generated when an object is removed.
https://aws.amazon.com/blogs/aws/amazon-s3-object-expiration/
Please be aware that AWS only guarantees you will not be billed for expired objects. AWS gives no guarantees when expired objects will actually be cleaned. In my experience, this is usually a day within expiring, but this week I ran into a production issue, because one of my Lambda's choked on the amount of objects in my S3 bucket, which is normally limited, due to a Lifecycle Policy I set on the bucket. An inquiry at Amazon AWS Support confirmed this:
Thank you for the provided information, I have checked the LifeCycle Management for the eu-west-1 region and have noted that there has been a recent increase in the number of objects to be expired.
As the LifeCycle Manager is asynchronous for all S3 Buckets the region, increases in the number of objects to be expired can introduce longer delays [1].
So please be aware of this, when you're crafting your AWSome solutions!
[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/lifecycle-expire-general-considerations.html
Related
Reading about Bucket Locks in Cloud Storage made me think of something very evil and bad that one could do:
Create a Cloud Storage Bucket.
Set a retention policy of 100 years on the bucket.
Lock the retention policy to the bucket.
Upload many petabytes of objects to the bucket.
The project is now stuck with a bucket that cannot be deleted for 100 years and the project can never be deleted either due to a "lien". And theoretically, someone is stuck paying the bill to store the petabytes. For 100 years.
Is there any way, preferably programmatically or through configuration, to prevent users from locking a retention policy on a bucket but still permitting them to create and manage other aspects of Cloud Storage buckets that can't be bucket locked?
The more blunt permission system doesn't seem like it's fine grained enough to permit or deny locking:
https://cloud.google.com/storage/docs/access-control/iam-json
I'm thinking there's some way to use IAM Conditions to accomplish what I want, but I'm not sure how.
Update: I'm looking for a solution that does not force a retention policy to be set. John Hanley's organization policy contraint solution is interesting, but it forces a retention policy to be set with at least a 1 second retention across all applicable projects and it also disables the option to have bucket versioning enabled in the bucket.
A forced retention of 1 second can cause certain issues with applications that write and delete objects at the same key multiple times a second.
FWIW, AWS identifies these kinds of radioactive waste creation actions and lets policies be set on them accordingly.
Method 1:
Select or create a custom role for bucket users that does not have the permission resourcemanager.projects.updateLiens. That permission is required to create a Retention Policy.
Method 2:
This method has side effects such as not supporting object versioning but can prevent a long bucket lock such as 100 years.
You can set an Organization Policy Constraint to limit the maximum duration of a Retention Policy.
Name:
constraints/storage.retentionPolicySeconds
Description:
Retention policy duration in seconds
I'm looking for a way to be notified when an object in s3 changes storage class. I thought there would be a bucket event notification for this but I don't see it as an option. How can I know when an object moves from STANDARD to GLACIER? We have systems that depend on objects not being in GLACIER. If they change to GLACIER, we need to be made aware and handle them accordingly.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#supported-notification-event-types
You can use S3 access logs to capture changes in life cycle, but i think thats about it:
Amazon S3 server access logs can be enabled in an S3 bucket to capture
S3 Lifecycle-related actions such as object transition to another
storage class
Taken from AWS docs - life-cycle and other bucket config
You could certainly roll your own notifications for storage class transitions - might be a bit more involved than you are hoping for though.... You need a separate bucket to write your access logs. Setup an S3 notification for object creation in your new logs bucket to trigger a lambda function to process each new log file. In your lambda function use Athena to query the logs and fire off an SNS alert or perform some corrective action in code.
There are some limitations to be aware of though - see best effort logging means you might not get logs for a few hours
Updated 28/5/21
If the logs are on you should see the various lifecycle operations logged as they happen +/- a few hours. If not are you definitely meeting the minimum criteria for transitioning objects to glacier? (eg it takes 30 days to transition from standard to glacier).
As for:
The log record for a particular request might be delivered long after
the request was actually processed, or it might not be delivered at
all.
Consider S3's eventual consistency model and the SLA on data durability - there is possibility of data loss for any object in S3. I think the risk is relatively low of loosing log records, but it could happen.
You could also go for a more active approach - use s3 api from a lambda function triggered by cloudwatch events (cron like scheduling) to scan all the objects in the bucket and do something accordingly (send an email, take corrective action etc). Bare in mind this might get expensive depending on how often you run the lambda and how many object are in your bucket but low volumes might even be in the free tier depending on your usage.
As of Nov 2021 you can now do this via AWS EventBridge.
Simply create a new Rule on the s3 bucket that handles the Object Storage Class Changed event.
See https://aws.amazon.com/blogs/aws/new-use-amazon-s3-event-notifications-with-amazon-eventbridge/
I have setup Life-Cycle Rules in S3 buckets to expire objects after X days. I found that there is a delay between the expiration date and the date at which Amazon S3 removes an object(source). Is there any S3 event available to know exactly when objects expire? There is a Delete Event in S3 which doesn't seem to work.
Thanks
You can get the information of what day it will expire on via the following method.
To find when an object is scheduled to expire, use the HEAD Object or the GET Object API operations. These API operations return response headers that provide this information.
The delay is simply that it is in a queue and will only be processed when it is first in the queue.
See more: https://docs.aws.amazon.com/AmazonS3/latest/dev/lifecycle-expire-general-considerations.html
For example : One object tends to expire after 10 days , other one after 30 days of creation date. How can i ensure this object level expiration ?
I went through some guides that mentions rule for whole bucket but not for object specific expiration.
There is no in-built capability to perform daily deletions on a per-object basis, but one way to achieve it would be:
When the objects are created, add a metadata tag with the desired deletion date
Configure Amazon CloudWatch Events to trigger an AWS Lambda function once per day
The AWS Lambda function can look for objects in the bucket that have reached/passed the deletion date stored in metadata
The function can then delete the objects
This would give you the ability to specify daily expirations.
I set expiration dates on objects through the API. It's now May and many of these objects expired in March. All the docs say expired objects will be wiped on a daily basis but I think something is wrong.
The Expires metadata field is used to control caching of objects in browsers and CDNs. It is not related to actually deleting objects from Amazon S3.
If you wish to automatically delete objects from Amazon S3 after a certain period of time, you should crate a Lifecycle Rule.
See: Object Lifecycle Management - Amazon Simple Storage Service