My application uses Amazon S3 to store some files, uploaded by customer. I want to set a rule that automatically should watch for particular folder's content, specifically - to delete files, that were created month ago. Is that possible?
Yes you can set a rule that automatically should watch for particular folder's content, specifically - to delete files, that were created month ago.
For this go to 'lifecycle policy' -> 'Expiration'. In 'Expiration' section set prefix as the path to files that you want to apply your rule.
For example: If I want to apply rule to 'fileA.txt' in folder 'myFolder' in bucket 'myBucket'. Then I should set prefix as 'myFolder/'.
Amazon S3 has a flat structure with no hierarchy like you would see in a typical file system. However, for the sake of organizational simplicity, the Amazon S3 console supports the folder concept as a means of grouping objects. Amazon S3 does this by using key name prefixes for objects.
For more info refer: http://docs.aws.amazon.com/AmazonS3/latest/UG/FolderOperations.html
Yes, it is possible to delete/expire and transition the objects to lower cost storage classes in AWS to save cost. You can find it under
S3 - [Your_Folder] - Management - Create Lifecycle rule
provide folder that you want to perform the action in prefix section as "folder/"
Yes. You can setup an S3 lifecycle policy that will make S3 automatically delete all files older than X days: http://docs.aws.amazon.com/AmazonS3/latest/dev/ObjectExpiration.html
Related
We need to implement an expiration of X days of all customer data due to contractual obligations. Not too big of a deal, that's about as easy as it gets.
But at the same time, some customers' projects have files with metadata. Perhaps dataset definitions which most definitely DO NOT need to go away. We have free reign to tag or manipulate any of the data in any way we see fit. Since we have 500+ S3 buckets, we need a somewhat global solution.
Ideally, we would simply set an expiration on the bucket and another rule for the metadata/ prefix. Except then we have a rule overlap and metadata/* files will still get the X day expiration that's been applied to the entire bucket.
We can forcefully tag all objects NOT in metadata/* with something like allow_expiration = true using Lambda. While not out of the question, I would like to implement something a little more built-in with S3.
I don't think there's a way to implement what I'm after without using some kind of tagging and external script. Thoughts?
If you've got a free hand on tagging the object, you could use both prefix and / or a tag filter with S3 lifecycle.
You can filter objects by key prefix, object tags, or a combination of both (in which case Amazon S3 uses a logical AND to combine the filters).
See Lifecycle Filter Rules
You could automate the creation and management of your lifecycle rules with IaC, for example, terraform.
See S3 Bucket Lifecycle Configuration with Terraform
There's a useful blog on how to manage these dynamically here.
What's more, using tags has a number of additional benefits:
Object tags enable fine-grained access control of permissions. For
example, you could grant an IAM user permissions to read-only objects
with specific tags.
Object tags enable fine-grained object lifecycle management in which
you can specify a tag-based filter, in addition to a key name prefix,
in a lifecycle rule.
When using Amazon S3 analytics, you can configure filters to group
objects together for analysis by object tags, by key name prefix, or
by both prefix and tags.
You can also customize Amazon CloudWatch metrics to display
information by specific tag filters.
Source and more on how to set tags to multiple Amazon S3 object with a single request.
Is there a way similar to an S3 lifecycle policy (or just an s3 bucket policy) that will automatically delete objects in a bucket older than x days and with a given file extension?
Depending on the extension it might be a bucket wide delete action or only delete objects under certain prefixes.
with a given file extension
Sadly you can't do this with S3 lifecycles, as they only work based on prefix, not suffix such as an extension.
This means you need a custom solution for that problem. Depending on exact nature of the issue (number of files, how frequently do you want to perform the deletion operation), there are several ways for doing this. They including running a single lambda on schedule, S3 Batch operations, using DynamoDB to store the metadata, and so on.
If you are uploading your files using S3 (PutObject, etc...), you can tag your objects and then use the tag to delete them using the S3 lifecycles.
Currently, my S3 bucket contains files. I want to create a folder for each file on S3.
Current -> s3://<bucket>/test.txt
Expectation -> s3://<bucket>/test/test.txt
How can I achieve this using the EC2 instance?
S3 doesn't have "folders" really, object names may contain / characters in them and that in a way emulates folders. Simply name your objects test/<filename> to achieve that. See the S3 docs for more.
As for doing it from EC2, it is no different from doing it from anywhere else (except, maybe, in EC2 you may be able to rely on an IAM profile instead of using ad-hoc credentials). If you've tried it and failed, maybe post a new question with more details.
If you have Linux you can try something like:
aws s3 ls s3://bucket/ | while read date time size name; do aws s3 mv s3://bucket/${name} s3://bucket/`echo ${name%.*}`/${name}; done
it does not depend upon EC2 instance. You can use aws cli from EC2 instance or some other source with putting desired path, for your case s3:///test/test.txt. you can even change the name of the file you are copying into s3 bucket even its extension if you want.
Does anyone know if it is possible to replicate just a folder of a bucket between 2 buckets using AWS S3 replication feature?
P.S.: I don't want to replicate the entire bucket, just one folder of the bucket.
If it is possible, what configurations I need to add to filter that folder in the replication?
Yes. Amazon S3's Replication feature allows you to replicate objects at a prefix (say, folder) level from one S3 bucket to another within same region or across regions.
From the AWS S3 Replication documentation,
The objects that you want to replicate — You can replicate all of the objects in the source bucket or a subset. You identify a subset by providing a key name prefix, one or more object tags, or both in the configuration.
For example, if you configure a replication rule to replicate only objects with the key name prefix Tax/, Amazon S3 replicates objects with keys such as Tax/doc1 or Tax/doc2. But it doesn't replicate an object with the key Legal/doc3. If you specify both prefix and one or more tags, Amazon S3 replicates only objects having the specific key prefix and tags.
Refer to this guide on how to enable replication using AWS console. Step 4 talks about enabling replication at prefix level. The same can be done via Cloudformation and CLI as well.
Yes you can do this using the Cross-Region Replication feature. You can replicate the object either in the same region or a different one. The replicated object in the new bucket will keep their original storage class, object name and object permissions.
However, you can change the owner to the new owner of the destination bucket.
Despite all of this, there are disadvantages of this feature:-
You cannot replicate objects which are present in the source bucket
before you create the replication rule using CRR. Only the ones
which are created after replication rule can be created.
You cannot use SSE-C encryption in replication.
You can do this with sync command.
aws s3 sync s3://SOURCE_BUCKET_NAME s3://NEW_BUCKET_NAME
You must grant the destination account the permissions to perform the cross-account copy.
I need to update the ACL settings of over 1600 objects that are in over 160 folders in an S3 bucket.
The files have already been uploaded into s3.
Specifically, I need to do the following:
Give the owner (i.e. me) FULL CONTROL
disable anonymous/public READs
give my CloudFront user READ access (as determined by the canonical ID retrieved for the Origin Access Identity)
The files and folders have a standard naming convention:
s3://bucket/videos/XXXX/XXXX.mp4
s3://bucket/videos/XXXX/XXXX.webm
s3://bucket/videos/XXXX/XXXX.ogv
s3://bucket/videos/XXXX/XXXX-pf.jpg
s3://bucket/videos/XXXX/XXXX-lg.jpg
s3://bucket/videos/XXXX/XXXX-sm.jpg
XXXX is replaced by a number between 0001 and 9999
What is the easiest way to do that, because using the console is extremely time consuming.
I have s3cmd configured on my server. Can s3cmd handle that...and what would the syntax be?
If not s3cmd, what other tool would be available...command line preferred.
S3 bucket policies are the answer. Taken care of. Just thought I'd mention this.