We need to implement an expiration of X days of all customer data due to contractual obligations. Not too big of a deal, that's about as easy as it gets.
But at the same time, some customers' projects have files with metadata. Perhaps dataset definitions which most definitely DO NOT need to go away. We have free reign to tag or manipulate any of the data in any way we see fit. Since we have 500+ S3 buckets, we need a somewhat global solution.
Ideally, we would simply set an expiration on the bucket and another rule for the metadata/ prefix. Except then we have a rule overlap and metadata/* files will still get the X day expiration that's been applied to the entire bucket.
We can forcefully tag all objects NOT in metadata/* with something like allow_expiration = true using Lambda. While not out of the question, I would like to implement something a little more built-in with S3.
I don't think there's a way to implement what I'm after without using some kind of tagging and external script. Thoughts?
If you've got a free hand on tagging the object, you could use both prefix and / or a tag filter with S3 lifecycle.
You can filter objects by key prefix, object tags, or a combination of both (in which case Amazon S3 uses a logical AND to combine the filters).
See Lifecycle Filter Rules
You could automate the creation and management of your lifecycle rules with IaC, for example, terraform.
See S3 Bucket Lifecycle Configuration with Terraform
There's a useful blog on how to manage these dynamically here.
What's more, using tags has a number of additional benefits:
Object tags enable fine-grained access control of permissions. For
example, you could grant an IAM user permissions to read-only objects
with specific tags.
Object tags enable fine-grained object lifecycle management in which
you can specify a tag-based filter, in addition to a key name prefix,
in a lifecycle rule.
When using Amazon S3 analytics, you can configure filters to group
objects together for analysis by object tags, by key name prefix, or
by both prefix and tags.
You can also customize Amazon CloudWatch metrics to display
information by specific tag filters.
Source and more on how to set tags to multiple Amazon S3 object with a single request.
Related
I have a bucket that stores many of my application's logs and I want to be able to retain certain objects in the bucket. The objects do not have any tags associated with them except for the ones I want to retain, eg tag:{permanent:true}
How do I set the lifecycle policy so that I can retain permanent objects while other objects in the bucket follow the bucket level lifecycle policy.
In my opinion you can try something like this, add tags to all the objects like permanent:false which you do not want to retain.
So now you have objects with permanent:true or permanent:false.
Now create a lifecycle policy rule with Limit the scope of this rule using one or more filter and then add tags permanent:false and set action to permanently delete version of such objects.
Otherwise an alternate solution schedule a lambda which checks everyday and delete files which does not have tag permanent:true
Is there a way for me to know how many times an object in my bucket has been requested ?
If you know which objects you care about in advance you can probably go the way samtoddler suggested. For a more generic approach there are two options:
You can enable object-level logging in CloudTrail. CloudTrail will then track all API-calls concerning the bucket and you can parse the information from CloudTrail to get the desired info.
You can enable server access logging in S3 and store access logs for the bucket into another bucket. You can then use something like Athena to compile more detailed statistics about any particular objects in your bucket.
Personally I'd go with option 2) as the format is a little easier to work with for simpler queries. For a comparison of the options take a look at this documentation. Note that these options are not mutually exclusive, you can use both if you like.
can be done via s3 metrics with Amazon CloudWatch. Further, as per your use case, you can configure the filter by object tag or prefix
Is there anyway to move less frequent S3 buckets to glacier automatically? I mean to say, some option or service searches on S3 with least access date and then assign lifecycle policy to them, so they can be moved to glacier? or I have to write a program to do this? If this not possible, is there anyway to assign lifecycle policy to all the buckets at once?
Looking for some feedback. Thank you.
No this isn't possible as a ready made feature. However, there is something that might help, Amazon S3 Analytics
This produces a report of which items in your buckets are less frequently used. This information can be used find items that should be archived.
It could be possible to use the S3 Analytics output as input for a script to tag items for archiving. However, this complete feature (find infrequently used items and then archive them) doesn't seem to be available as a standard product
You can do this by adding a tag or prefix to your buckets.
Create lifecycle rule to target that tag or prefix to group your buckets together and assign/apply a single lifecycle policy.
https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html
I have a task to control object lifecycle of particular objects in S3 bucket. Eg: most of the objects should expire and be deleted according to lifecycle policy, but for some objects I want the expiration never happen. In Amazon SQS there is a possibility to control lifecycle parameters of each single message, but I can't find such feature in docs for S3. Is it possible?
No, it isn't. Lifecycle policies apply to all the objects in the bucket, or all the objects with a matching prefix. You'd need to set the policy on a specific key prefix, and then store the objects you want to match the policy, using that prefix, but the other objects with a different prefix. That's the closest thing available, and it's not really all that close.
My application uses Amazon S3 to store some files, uploaded by customer. I want to set a rule that automatically should watch for particular folder's content, specifically - to delete files, that were created month ago. Is that possible?
Yes you can set a rule that automatically should watch for particular folder's content, specifically - to delete files, that were created month ago.
For this go to 'lifecycle policy' -> 'Expiration'. In 'Expiration' section set prefix as the path to files that you want to apply your rule.
For example: If I want to apply rule to 'fileA.txt' in folder 'myFolder' in bucket 'myBucket'. Then I should set prefix as 'myFolder/'.
Amazon S3 has a flat structure with no hierarchy like you would see in a typical file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. Amazon S3 does this by using key name prefixes for objects.
For more info refer: http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
Yes, it is possible to delete/expire and transition the objects to lower cost storage classes in AWS to save cost. You can find it under
S3 - [Your_Folder] - Management - Create Lifecycle rule
provide folder that you want to perform the action in prefix section as "folder/"
Yes. You can setup an S3 lifecycle policy that will make S3 automatically delete all files older than X days: http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectExpiration.html