I have a bucket that stores many of my application's logs and I want to be able to retain certain objects in the bucket. The objects do not have any tags associated with them except for the ones I want to retain, eg tag:{permanent:true}
How do I set the lifecycle policy so that I can retain permanent objects while other objects in the bucket follow the bucket level lifecycle policy.
In my opinion you can try something like this, add tags to all the objects like permanent:false which you do not want to retain.
So now you have objects with permanent:true or permanent:false.
Now create a lifecycle policy rule with Limit the scope of this rule using one or more filter and then add tags permanent:false and set action to permanently delete version of such objects.
Otherwise an alternate solution schedule a lambda which checks everyday and delete files which does not have tag permanent:true
Related
We need to implement an expiration of X days of all customer data due to contractual obligations. Not too big of a deal, that's about as easy as it gets.
But at the same time, some customers' projects have files with metadata. Perhaps dataset definitions which most definitely DO NOT need to go away. We have free reign to tag or manipulate any of the data in any way we see fit. Since we have 500+ S3 buckets, we need a somewhat global solution.
Ideally, we would simply set an expiration on the bucket and another rule for the metadata/ prefix. Except then we have a rule overlap and metadata/* files will still get the X day expiration that's been applied to the entire bucket.
We can forcefully tag all objects NOT in metadata/* with something like allow_expiration = true using Lambda. While not out of the question, I would like to implement something a little more built-in with S3.
I don't think there's a way to implement what I'm after without using some kind of tagging and external script. Thoughts?
If you've got a free hand on tagging the object, you could use both prefix and / or a tag filter with S3 lifecycle.
You can filter objects by key prefix, object tags, or a combination of both (in which case Amazon S3 uses a logical AND to combine the filters).
See Lifecycle Filter Rules
You could automate the creation and management of your lifecycle rules with IaC, for example, terraform.
See S3 Bucket Lifecycle Configuration with Terraform
There's a useful blog on how to manage these dynamically here.
What's more, using tags has a number of additional benefits:
Object tags enable fine-grained access control of permissions. For
example, you could grant an IAM user permissions to read-only objects
with specific tags.
Object tags enable fine-grained object lifecycle management in which
you can specify a tag-based filter, in addition to a key name prefix,
in a lifecycle rule.
When using Amazon S3 analytics, you can configure filters to group
objects together for analysis by object tags, by key name prefix, or
by both prefix and tags.
You can also customize Amazon CloudWatch metrics to display
information by specific tag filters.
Source and more on how to set tags to multiple Amazon S3 object with a single request.
Is there a way to trigger a lambda before a bucket is actually deleted (for example, with a stack that it is a part of) or emptied to copy its objects? Maybe something else could be used instead of lambdas?
Deletion of a CloudFormation (CNF) stack with non-empty bucket will fail as non-empty buckets can't be deleted, unless you set its DeletionPolicy to retain. The retain would delete the stack, but leave out the bucket in your account. Without retain, you have to first delete all objects in a bucket before bucket can be deleted.
In either way, you have to delete the objects yourself through a custom lambda function. There is no out-of-the box mechanism in CFN nor S3 to delete objects when bucket is deleted. But since this is something that you have to develop yourself, you can do whatever you want with these objects before you actually delete them, e.g. copy to glacier.
There are few ways in which this can be achieve. But probably the most common way is through a custom resource, similar to the one given in AWS blog:
How do I use custom resources with Amazon S3 buckets in AWS CloudFormation?
The resource given in this blog actually responds to Delete event in CFN and deletes the objects in the bucket:
b_operator.Bucket(str(the_bucket)).objects.all().delete()
So you would have to modify this custom resource to copy objects before the deletion operation is performed.
I have a bucket, in which I have set s3:Delete* to Deny, so that objects don't get deleted from the bucket. However, I want to move some objects to a s3://bucket-name/trash directory, and set a lifecycle policy to delete all the items in trash after 30 days.
I am not able to move those items, because the Delete Deny policy overrides it. Is there any solution that would help to bypass the Delete Deny policy so that I can move objects to just one folder?
Thanks
According to the documentation,
This action creates a copy of all specified objects with updated settings, updates the last-modified date in the specified location, and adds a delete marker to the original object.
The reason why your approach doesn't work is because move is essentially copy + delete. An alternative is to enable the bucket versioning, and apply a lifecycle policy to expire the previous versions after 30 days. Finally, change the permission to only deny s3:DeleteObjectVersion.
The bucket policy is not the best place to prevent objects from being deleted. Instead, enable Object Lock at bucket level, then set objects in governance mode, so they can't be deleted by normal operations. When you do need to move them, you can still bypass the protection with a special permission. See: https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lock-managing.html
I have a DynamoDB table that tracks an S3 bucket's objects and their tags, and want to keep the table consistent with the objects' state. This means updating the table when object tags are added, removed, or deleted.
Currently I am trying to achieve the above by running a Lambda that writes to the table when the above events occur. I'm having trouble finding tag-related triggers for my use case.
Here are some potential solutions that I've researched:
S3 PUT trigger, for detecting tag creation: However, it seems like this trigger is only activated for object creation, and not for PutObjectTagging PUT requests.
Creating a CloudWatch rule for detecting tag changes: The problem with this one is that the tag support doesn't seem to include anything beyond the service level, such as tag changes to S3 objects. Also, tag creation and deletion are left out by this solution.
I have a task to control object lifecycle of particular objects in S3 bucket. Eg: most of the objects should expire and be deleted according to lifecycle policy, but for some objects I want the expiration never happen. In Amazon SQS there is a possibility to control lifecycle parameters of each single message, but I can't find such feature in docs for S3. Is it possible?
No, it isn't. Lifecycle policies apply to all the objects in the bucket, or all the objects with a matching prefix. You'd need to set the policy on a specific key prefix, and then store the objects you want to match the policy, using that prefix, but the other objects with a different prefix. That's the closest thing available, and it's not really all that close.