I have a task to control object lifecycle of particular objects in S3 bucket. Eg: most of the objects should expire and be deleted according to lifecycle policy, but for some objects I want the expiration never happen. In Amazon SQS there is a possibility to control lifecycle parameters of each single message, but I can't find such feature in docs for S3. Is it possible?
No, it isn't. Lifecycle policies apply to all the objects in the bucket, or all the objects with a matching prefix. You'd need to set the policy on a specific key prefix, and then store the objects you want to match the policy, using that prefix, but the other objects with a different prefix. That's the closest thing available, and it's not really all that close.
Related
I have a bucket that stores many of my application's logs and I want to be able to retain certain objects in the bucket. The objects do not have any tags associated with them except for the ones I want to retain, eg tag:{permanent:true}
How do I set the lifecycle policy so that I can retain permanent objects while other objects in the bucket follow the bucket level lifecycle policy.
In my opinion you can try something like this, add tags to all the objects like permanent:false which you do not want to retain.
So now you have objects with permanent:true or permanent:false.
Now create a lifecycle policy rule with Limit the scope of this rule using one or more filter and then add tags permanent:false and set action to permanently delete version of such objects.
Otherwise an alternate solution schedule a lambda which checks everyday and delete files which does not have tag permanent:true
We need to implement an expiration of X days of all customer data due to contractual obligations. Not too big of a deal, that's about as easy as it gets.
But at the same time, some customers' projects have files with metadata. Perhaps dataset definitions which most definitely DO NOT need to go away. We have free reign to tag or manipulate any of the data in any way we see fit. Since we have 500+ S3 buckets, we need a somewhat global solution.
Ideally, we would simply set an expiration on the bucket and another rule for the metadata/ prefix. Except then we have a rule overlap and metadata/* files will still get the X day expiration that's been applied to the entire bucket.
We can forcefully tag all objects NOT in metadata/* with something like allow_expiration = true using Lambda. While not out of the question, I would like to implement something a little more built-in with S3.
I don't think there's a way to implement what I'm after without using some kind of tagging and external script. Thoughts?
If you've got a free hand on tagging the object, you could use both prefix and / or a tag filter with S3 lifecycle.
You can filter objects by key prefix, object tags, or a combination of both (in which case Amazon S3 uses a logical AND to combine the filters).
See Lifecycle Filter Rules
You could automate the creation and management of your lifecycle rules with IaC, for example, terraform.
See S3 Bucket Lifecycle Configuration with Terraform
There's a useful blog on how to manage these dynamically here.
What's more, using tags has a number of additional benefits:
Object tags enable fine-grained access control of permissions. For
example, you could grant an IAM user permissions to read-only objects
with specific tags.
Object tags enable fine-grained object lifecycle management in which
you can specify a tag-based filter, in addition to a key name prefix,
in a lifecycle rule.
When using Amazon S3 analytics, you can configure filters to group
objects together for analysis by object tags, by key name prefix, or
by both prefix and tags.
You can also customize Amazon CloudWatch metrics to display
information by specific tag filters.
Source and more on how to set tags to multiple Amazon S3 object with a single request.
I'm looking for a way to be notified when an object in s3 changes storage class. I thought there would be a bucket event notification for this but I don't see it as an option. How can I know when an object moves from STANDARD to GLACIER? We have systems that depend on objects not being in GLACIER. If they change to GLACIER, we need to be made aware and handle them accordingly.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/notification-how-to-event-types-and-destinations.html#supported-notification-event-types
You can use S3 access logs to capture changes in life cycle, but i think thats about it:
Amazon S3 server access logs can be enabled in an S3 bucket to capture
S3 Lifecycle-related actions such as object transition to another
storage class
Taken from AWS docs - life-cycle and other bucket config
You could certainly roll your own notifications for storage class transitions - might be a bit more involved than you are hoping for though.... You need a separate bucket to write your access logs. Setup an S3 notification for object creation in your new logs bucket to trigger a lambda function to process each new log file. In your lambda function use Athena to query the logs and fire off an SNS alert or perform some corrective action in code.
There are some limitations to be aware of though - see best effort logging means you might not get logs for a few hours
Updated 28/5/21
If the logs are on you should see the various lifecycle operations logged as they happen +/- a few hours. If not are you definitely meeting the minimum criteria for transitioning objects to glacier? (eg it takes 30 days to transition from standard to glacier).
As for:
The log record for a particular request might be delivered long after
the request was actually processed, or it might not be delivered at
all.
Consider S3's eventual consistency model and the SLA on data durability - there is possibility of data loss for any object in S3. I think the risk is relatively low of loosing log records, but it could happen.
You could also go for a more active approach - use s3 api from a lambda function triggered by cloudwatch events (cron like scheduling) to scan all the objects in the bucket and do something accordingly (send an email, take corrective action etc). Bare in mind this might get expensive depending on how often you run the lambda and how many object are in your bucket but low volumes might even be in the free tier depending on your usage.
As of Nov 2021 you can now do this via AWS EventBridge.
Simply create a new Rule on the s3 bucket that handles the Object Storage Class Changed event.
See https://aws.amazon.com/blogs/aws/new-use-amazon-s3-event-notifications-with-amazon-eventbridge/
Is there a way to make a Google Cloud Storage bucket "append-only"?
To clarify, I want to make it so that trying to overwrite/modify an existing object returns an error.
Right now the only way I see to do this is client-side, by checking if the object exists before trying to write to it, but that doubles the number of calls I need to make.
There are several Google Cloud Storage features that you can enable:
Object Versioning
Bucket Lock
Retention Policies
The simplest method is to implement Object Versioning. This prevents objects from being overwritten or deleted. This does require changes to client code to know how to request a specific version of an object if multiple versions have been created due to object overwrites and deletes.
Cloud Storge Object Versioning
For more complicated scenarios implement bucket lock and retention policies. These features allow you to configure a data retention policy for a Cloud Storage bucket that governs how long objects in the bucket must be retained
Retention policies and Bucket Lock
Is there anyway to move less frequent S3 buckets to glacier automatically? I mean to say, some option or service searches on S3 with least access date and then assign lifecycle policy to them, so they can be moved to glacier? or I have to write a program to do this? If this not possible, is there anyway to assign lifecycle policy to all the buckets at once?
Looking for some feedback. Thank you.
No this isn't possible as a ready made feature. However, there is something that might help, Amazon S3 Analytics
This produces a report of which items in your buckets are less frequently used. This information can be used find items that should be archived.
It could be possible to use the S3 Analytics output as input for a script to tag items for archiving. However, this complete feature (find infrequently used items and then archive them) doesn't seem to be available as a standard product
You can do this by adding a tag or prefix to your buckets.
Create lifecycle rule to target that tag or prefix to group your buckets together and assign/apply a single lifecycle policy.
https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html