AWS S3 Lifecycle - amazon-web-services

I've been exploring AWS S3 Lifecycle techniques and found the best way to delete S3 files > 60 days old is to configure this through the GUI.
However, I'm not wanting to delete ALL files greater than 60 days. For example, I'd like to at least keep all HTML files inside the bucket that are greater than 60 days.
I've found that a prefix can be entered to limit the scope of the lifecycle to a specific file; however, this requires me to enter ALL files EXCEPT HTMLs. We have hundreds of files, so this will take forever.
I was wondering if anyone knew of an easier way? For example, I would like to just exclude all *.html from the lifecycle.

There is no way to exclude object from rules.
You can rearrange object in your bucket so rule can be applied to objects in specified prefix ("folder").

Related

AWS S3 delete all the objects or within in a given date range

I am really having a hard time of deleting my bucket jananath-logs-bucket-new. It has over 70 TB of data and I need to delete the entire bucket. This has files from 2019
I tried deleting the bucket and since it has many small files (over 50 millions), it take so much time and the UI (browser hangs). So I thought, let the AWS do it for me.
So I tried the lifecycle rules. So I created the two rules
delete-all-from-start
delete-all-from-start-2
And below are the screenshots of each rule:
delete-all-from-start
delete-all-from-start-2
And both the rules look like this now:
But my objects are not deleted.
I have given the number of days for each field as 1 thinking it would delete everything from 2019 (where the first object is created).
Can someone help me on this?
How can I delete the entire objects from the bucket from the 2019
Is it possible to delete the objects between a date range - say from 2020-2021 ?
Thank you,
Have a great day!
According to the documentation a lifecycle policy is a valid way to empty a bucket. Please note that there may be a delay for expiring objects:
When an object reaches the end of its lifetime based on its lifecycle
policy, Amazon S3 queues it for removal and removes it asynchronously.
There might be a delay between the expiration date and the date at
which Amazon S3 removes an object.

Amazon S3 - Can prefixes include the start of the file name?

I have several types of files within an Amazon S3 bucket, all of which are in the same folder. There are three "types" of files that I wan't to apply different transition/delete days to, and all of their filenames start the same way. I am wondering if prefixes for files need to just address folders, or if they can include the start of the filename as well. For example, the files start with data_file_*, log_file_*, and error_file_*, if they are all in a folder files/, can I set a rule with the prefix being files/error_file_? If so, is that syntax correct?
Note that changing the directory structure is not an option for me, and the AWS documentation doesn't have any examples like this, or any related comments that I can find.
The use-case you describe is actually the only valid way to set lifecycle rules. S3 has no concept of "folders" (even though it looks like that in the AWS console). It only understands filenames that happen to have slashes in them. This is typical for object based store (S3), in contrast to file storage (your laptop).
So when creating lifecycle rules, include the full path of the object (files/error_file_). Then the rule will be applied to all files with that prefix.

Delete all version of S3 object using lifecycle rule

I have a S3 bucket with multiple folders with versioning enabled.
Out of these multiple folders I want to complete delete one folder as it has multiple delete marker.
I am using Lifecycle rule to delete the objects but not sure if it will work for specific folder.
In Lifecycle Rule, If I specify the folder_name/ as a prefix and expiration rule as 1 day after creation for all and current versions.
Will it delete all the objects and its versions ?
Can someone please confirm ?
The other folders are quite critical so can't mess with the rule to test.
I can confirm that you can delete at folder level instead of entire bucket. We have a rule that does the exact same thing (although 7 days instead of 1). I will echo John's point that after initial setup, it will take time to do the deletion. You should see progress STARTING within 1 hour, but actual completion may take a while.

AWS S3 : Do Lifecycle rules accept regex?

I have an s3 bucket with "folders" folder1, folder2, folder3, folder4. In folder2 and folder3 there is a "new" folder. I need to delete everything in "new", older than 1 day. Can I do that with a rule like /*/new/ ? Some guys say they have seen such rules work in the past, but that particular definition does nothing.
(In the real bucket there are folder1, folder2 ... folder3001 so I can't make rules for every folder, so please don't suggest that. The above example is for simplicity only.)
The PUT livecycle API takes a "Prefix", which as the name says is a prefix, not a regex.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTlifecycle.html
There is also a limit of 1000 rules per bucket.
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
You could change your folder structure so that keys look like "new/folderN".

How do I delete/count objects in a s3 bucket?

So I know this is a common question but there just doesn't seem to be any good answers for it.
I have a bucket with gobs (I have no clue how many) number of files in them. They are all within 2k a piece.
1) How do I figure out how many of these files I have WITHOUT listing them?
I've used the s3cmd.rb, aws/s3, and jets3t stuff and the best I can find is a command to count the first 1000 records (really performing GETS on them).
I've been using jets3t's applet as well cause it's really nice to work with but even that I can't list all my objects cause I run out of heap space. (presumably cause it is peforming GETS on all of them and keeping them in memory)
2) How can I just delete a bucket?
The best thing I've seen is a paralleized delete loop and that has problems cause sometimes it tries to delete the same file. This is what all the 'deleteall' commands that I've ran across do.
What do you guys do who have boasted about hosting millions of images/txts?? What happens when you want to remove it?
3) Lastly, are there alternate answers to this? All of these files are txt/xml files so I'm not even sure S3 is such a concern -- maybe I should move this to a document database of sorts??
What it boils down to is that the amazon S3 API is just straight out missing 2 very important operations -- COUNT and DEL_BUCKET. (actually there is a delete bucket command but it only works when the bucket is empty) If someone comes up with a method that does not suck to do these two operations I'd gladly give up lots of bounty.
UPDATE
Just to answer a few questions. The reason I ask this was I have been for the past year or so been storing hundreds of thousands, more like millions of 2k txt and xml documents. The last time, a couple of months ago, I wished to delete the bucket it literally took DAYS to do so because the bucket has to be empty before you can delete it. This was such a pain in the ass I am fearing ever having to do this again without API support for it.
UPDATE
this rocks the house!
http://github.com/SFEley/s3nuke/
I rm'd a good couple gigs worth of 1-2k files within minutes.
I am most certainly not one of those 'guys do who have boasted about hosting millions of images/txts', as I only have a few thousand, and this may not be the answer you are looking for, but I looked at this a while back.
From what I remember, there is an API command called HEAD which gets information about an object rather than retrieving the complete object which is what GET does, which may help in counting the objects.
As far as deleting Buckets, at the time I was looking, the API definitely stated that the bucket had to be empty, so you need to delete all the objects first.
But, I never used either of these commands, because I was using S3 as a backup and in the end I wrote a few routines that uploaded the files I wanted to S3 (so that part was automated), but never bothered with the restore/delete/file management side of the equation. For that use Bucket Explorer which did all I need. In my case, it wasn't worth spending time when for $50 I can get a program that does all I need. There are probably others that do the same (eg CloudBerry)
In your case, with Bucket Explorer, you can right click on a bucket and select delete or right click and select properties and it will count the number of objects and the size they take up. It certainly does not download the whole object. (Eg the last bucket I looked it was 12Gb and around 500 files and it would take hours to download 12GB whereas the size and count is returned in a second or two). And if there is a limit, then it certainly isn't 1000.
Hope this helps.
"List" won't retrieve the data. I use s3cmd (a python script) and I would have done something like this:
s3cmd ls s3://foo | awk '{print $4}' | split -a 5 -l 10000 bucketfiles_
for i in bucketfiles_*; do xargs -n 1 s3cmd rm < $i & done
But first check how many bucketfiles_ files you get. There will be one s3cmd running per file.
It will take a while, but not days.
1) Regarding your first question, you can list the items on a bucket without actually retrieving them. You can do that both with the SOAP and the REST API. As you can see, you can define the maximum number of items to list and the position to start the listing from (the marker). Read more about it here.
I do not know of any implementation of the paging, but especially for the REST interface it would be very easy to implement it in any language.
2) I believe the only way to delete a bucket is to first empty it from all items. See alse this question.
3) I would say that S3 is very well suited for storing a large number of files. It depends however on what you want to do. Do you plan to also store binary files? Do you need to perform any queries or just listing the files is enough?
I've had the same problem with deleting hundreds of thousands of files from a bucket. It may be worthwhile to fire up an EC2 instance to run the parallel delete because the latency to S3 is low. I think there's some money to be made hosting a bunch of EC2 servers and charging people to delete buckets quickly. (At least until Amazon gets around to changing the API)
Old thread, but still relevant as I was looking for the answer until I just figured this out. I wanted a file count using a GUI-based tool (i.e. no code). I happen to already use a tool called 3Hub for drag & drop transfers to and from S3. I wanted to know how many files I had in a particular bucket (I don't think billing breaks it down by buckets).
So, using 3Hub,
- list the contents of the bucket (looks basically like a finder or explorer window)
- go to the bottom of the list, click 'show all'
- select all (ctrl+a)
- choose copy URLs from right-click menu
- paste the list into a text file (I use TextWrangler for Mac)
- look at the line count
I had 20521 files in the bucket and did the file count in less than a minute.
I'd like to know if anyone's found a better way since this would take some time on hundreds of thousands of files.
To count objects in an S3 bucket:
Go to AWS Billing, then reports, then AWS Usage reports.
Select Amazon Simple Storage Service, then Operation StandardStorage.
Download a CSV file that includes a UsageType of StorageObjectCount that lists the item count for each bucket.
Count
aws s3 ls s3://mybucket/ --recursive | wc -l
From this post
Delete
aws s3 rm --recursive s3://mybucket/ && aws s3 rb s3://mybucket/
This deletes every item then the bucket.