I set expiration dates on objects through the API. It's now May and many of these objects expired in March. All the docs say expired objects will be wiped on a daily basis but I think something is wrong.
The Expires metadata field is used to control caching of objects in browsers and CDNs. It is not related to actually deleting objects from Amazon S3.
If you wish to automatically delete objects from Amazon S3 after a certain period of time, you should crate a Lifecycle Rule.
See: Object Lifecycle Management - Amazon Simple Storage Service
Related
I'm looking for a way to be notified when an object in s3 changes storage class. I thought there would be a bucket event notification for this but I don't see it as an option. How can I know when an object moves from STANDARD to GLACIER? We have systems that depend on objects not being in GLACIER. If they change to GLACIER, we need to be made aware and handle them accordingly.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#supported-notification-event-types
You can use S3 access logs to capture changes in life cycle, but i think thats about it:
Amazon S3 server access logs can be enabled in an S3 bucket to capture
S3 Lifecycle-related actions such as object transition to another
storage class
Taken from AWS docs - life-cycle and other bucket config
You could certainly roll your own notifications for storage class transitions - might be a bit more involved than you are hoping for though.... You need a separate bucket to write your access logs. Setup an S3 notification for object creation in your new logs bucket to trigger a lambda function to process each new log file. In your lambda function use Athena to query the logs and fire off an SNS alert or perform some corrective action in code.
There are some limitations to be aware of though - see best effort logging means you might not get logs for a few hours
Updated 28/5/21
If the logs are on you should see the various lifecycle operations logged as they happen +/- a few hours. If not are you definitely meeting the minimum criteria for transitioning objects to glacier? (eg it takes 30 days to transition from standard to glacier).
As for:
The log record for a particular request might be delivered long after
the request was actually processed, or it might not be delivered at
all.
Consider S3's eventual consistency model and the SLA on data durability - there is possibility of data loss for any object in S3. I think the risk is relatively low of loosing log records, but it could happen.
You could also go for a more active approach - use s3 api from a lambda function triggered by cloudwatch events (cron like scheduling) to scan all the objects in the bucket and do something accordingly (send an email, take corrective action etc). Bare in mind this might get expensive depending on how often you run the lambda and how many object are in your bucket but low volumes might even be in the free tier depending on your usage.
As of Nov 2021 you can now do this via AWS EventBridge.
Simply create a new Rule on the s3 bucket that handles the Object Storage Class Changed event.
See https://aws.amazon.com/blogs/aws/new-use-amazon-s3-event-notifications-with-amazon-eventbridge/
For example : One object tends to expire after 10 days , other one after 30 days of creation date. How can i ensure this object level expiration ?
I went through some guides that mentions rule for whole bucket but not for object specific expiration.
There is no in-built capability to perform daily deletions on a per-object basis, but one way to achieve it would be:
When the objects are created, add a metadata tag with the desired deletion date
Configure Amazon CloudWatch Events to trigger an AWS Lambda function once per day
The AWS Lambda function can look for objects in the bucket that have reached/passed the deletion date stored in metadata
The function can then delete the objects
This would give you the ability to specify daily expirations.
I been asked in interview How to delete S3 files in a object every 10 min after creation without CLI or script?
Any service or option present in AWS to does such job ?
You can use AWS S3 lifecycle configuration to delete the objects without the use of CLI or script
More details here
Following is the extract from that page
To manage your objects so that they are stored cost effectively
throughout their lifecycle, configure their lifecycle. A lifecycle
configuration is a set of rules that define actions that Amazon S3
applies to a group of objects. There are two types of actions:
Transition actions—Define when objects transition to another storage class. For example, you might choose to transition
objects to the STANDARD_IA storage class 30 days after you created
them, or archive objects to the GLACIER storage class one year after
creating them.
Expiration actions—Define when objects expire. Amazon S3 deletes expired objects on your behalf
I have recently joined a company that uses S3 Buckets for various different projects within AWS. I want to identify and potentially delete S3 Objects that are not being accessed (read and write), in an effort to reduce the cost of S3 in my AWS account.
I read this, which helped me to some extent.
Is there a way to find out which objects are being accessed and which are not?
There is no native way of doing this at the moment, so all the options are workarounds depending on your usecase.
You have a few options:
Tag each S3 Object (e.g. 2018-10-24). First turn on Object Level Logging for your S3 bucket. Set up CloudWatch Events for CloudTrail. The Tag could then be updated by a Lambda Function which runs on a CloudWatch Event, which is fired on a Get event. Then create a function that runs on a Scheduled CloudWatch Event to delete all objects with a date tag prior to today.
Query CloudTrail logs on, write a custom function to query the last access times from Object Level CloudTrail Logs. This could be done with Athena, or a direct query to S3.
Create a Separate Index, in something like DynamoDB, which you update in your application on read activities.
Use a Lifecycle Policy on the S3 Bucket / key prefix to archive or delete the objects after x days. This is based on upload time rather than last access time, so you could copy the object to itself to reset the timestamp and start the clock again.
No objects in Amazon S3 are required by other AWS services, but you might have configured services to use the files.
For example, you might be serving content through Amazon CloudFront, providing templates for AWS CloudFormation or transcoding videos that are stored in Amazon S3.
If you didn't create the files and you aren't knowingly using the files, can you probably delete them. But you would be the only person who would know whether they are necessary.
There is recent AWS blog post which I found very interesting and cost optimized approach to solve this problem.
Here is the description from AWS blog:
The S3 server access logs capture S3 object requests. These are generated and stored in the target S3 bucket.
An S3 inventory report is generated for the source bucket daily. It is written to the S3 inventory target bucket.
An Amazon EventBridge rule is configured that will initiate an AWS Lambda function once a day, or as desired.
The Lambda function initiates an S3 Batch Operation job to tag objects in the source bucket. These must be expired using the following logic:
Capture the number of days (x) configuration from the S3 Lifecycle configuration.
Run an Amazon Athena query that will get the list of objects from the S3 inventory report and server access logs. Create a delta list with objects that were created earlier than 'x' days, but not accessed during that time.
Write a manifest file with the list of these objects to an S3 bucket.
Create an S3 Batch operation job that will tag all objects in the manifest file with a tag of "delete=True".
The Lifecycle rule on the source S3 bucket will expire all objects that were created prior to 'x' days. They will have the tag given via the S3 batch operation of "delete=True".
Expiring Amazon S3 Objects Based on Last Accessed Date to Decrease Costs
I have a AWS S3 bucket with a 1 day expiration lifecycle rule. But the AWS didn’t delete the expired objects automatically. How can I let AWS delete the expired objects automatically? Are there still additional cost for these expired but unremoved objects?
No AWS will evaluate the expiration rules once in a day and will not be charged for that as said in below doc.
We evaluate the expiration rules once each day. During this time,
based on their expiration dates, any object found to be expired will
be queued for removal. You will not be billed for any associated
storage for those objects on or after their expiration date. If server
access logging has been enabled for that S3 bucket, an
S3.EXPIRE.OBJECT record will be generated when an object is removed.
https://aws.amazon.com/blogs/aws/amazon-s3-object-expiration/
Please be aware that AWS only guarantees you will not be billed for expired objects. AWS gives no guarantees when expired objects will actually be cleaned. In my experience, this is usually a day within expiring, but this week I ran into a production issue, because one of my Lambda's choked on the amount of objects in my S3 bucket, which is normally limited, due to a Lifecycle Policy I set on the bucket. An inquiry at Amazon AWS Support confirmed this:
Thank you for the provided information, I have checked the LifeCycle Management for the eu-west-1 region and have noted that there has been a recent increase in the number of objects to be expired.
As the LifeCycle Manager is asynchronous for all S3 Buckets the region, increases in the number of objects to be expired can introduce longer delays [1].
So please be aware of this, when you're crafting your AWSome solutions!
[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/lifecycle-expire-general-considerations.html