Faster way to delete TB of data from GCP cloud storage - google-cloud-platform

I want to delete 2TB of files from the GCP bucket.
I have read the GCP documentation for deletion and it says to use the gsutil -m rm command but when I am running it says 400+ hours estimate time.
Is there any faster way to do the deletion process?

For buckets with a very large number of objects, one trick to deleting the contents is to use the Lifecycle Management feature. https://cloud.google.com/storage/docs/lifecycle
Set a lifecycle rule that triggers when the object is 0 days old and an action of "Delete", and that should cause GCS to begin deleting your objects for you. Note that this may still take a while, as lifecycle rules can take up to 24 hours to go into effect, but that's still a lot better than a couple of weeks.
You can configure the lifecycle policy on a bucket from the console:
Head to https://console.cloud.google.com/storage/browser
Find the bucket you want to enable, and click None in the Lifecycle column.
Click Add rule.
Select the condition (object is 0 days old or )
Select an action (Delete the object)
Click continue.
Click save.
See https://cloud.google.com/storage/docs/managing-lifecycles for more instructions.
N.B.: Lifecycle changes can take up to 24 hours to go into effect, so once all of your objects go away and you remove the lifecycle config setting, you should wait an additional 24 hours before putting any new files in the bucket, or else they might also get deleted.

Related

AWS S3 Lifecycle

I've been exploring AWS S3 Lifecycle techniques and found the best way to delete S3 files > 60 days old is to configure this through the GUI.
However, I'm not wanting to delete ALL files greater than 60 days. For example, I'd like to at least keep all HTML files inside the bucket that are greater than 60 days.
I've found that a prefix can be entered to limit the scope of the lifecycle to a specific file; however, this requires me to enter ALL files EXCEPT HTMLs. We have hundreds of files, so this will take forever.
I was wondering if anyone knew of an easier way? For example, I would like to just exclude all *.html from the lifecycle.
There is no way to exclude object from rules.
You can rearrange object in your bucket so rule can be applied to objects in specified prefix ("folder").

AWS S3 delete all the objects or within in a given date range

I am really having a hard time of deleting my bucket jananath-logs-bucket-new. It has over 70 TB of data and I need to delete the entire bucket. This has files from 2019
I tried deleting the bucket and since it has many small files (over 50 millions), it take so much time and the UI (browser hangs). So I thought, let the AWS do it for me.
So I tried the lifecycle rules. So I created the two rules
delete-all-from-start
delete-all-from-start-2
And below are the screenshots of each rule:
delete-all-from-start
delete-all-from-start-2
And both the rules look like this now:
But my objects are not deleted.
I have given the number of days for each field as 1 thinking it would delete everything from 2019 (where the first object is created).
Can someone help me on this?
How can I delete the entire objects from the bucket from the 2019
Is it possible to delete the objects between a date range - say from 2020-2021 ?
Thank you,
Have a great day!
According to the documentation a lifecycle policy is a valid way to empty a bucket. Please note that there may be a delay for expiring objects:
When an object reaches the end of its lifetime based on its lifecycle
policy, Amazon S3 queues it for removal and removes it asynchronously.
There might be a delay between the expiration date and the date at
which Amazon S3 removes an object.

Actual Count Of Objects Incorrect After Deletion Of Bucket

Two days ago I deleted a bucket that contained a backup of all log files for a site. It contained about 30,000 tiny files and about 275 MB of space.
I noticed in the Monitoring panel of the site that the file count is exactly the same. Decided to wait a couple of days and it still has not changed.
The bucket uses standard storage class, multi-region location, and has no lifecycle rules with uniform permissions.
I can verify that the bucket is gone in the UI as well as using the ls command in cloud shell.
Cloud Storage Object Count
Cloud Storage Object Count
The count of objects in the Monitoring panel reconciled about two days later.
Looks like the change ended up being retroactive, meaning the charts in the past were re-written to reflect the objects being deleted.

Delete all version of S3 object using lifecycle rule

I have a S3 bucket with multiple folders with versioning enabled.
Out of these multiple folders I want to complete delete one folder as it has multiple delete marker.
I am using Lifecycle rule to delete the objects but not sure if it will work for specific folder.
In Lifecycle Rule, If I specify the folder_name/ as a prefix and expiration rule as 1 day after creation for all and current versions.
Will it delete all the objects and its versions ?
Can someone please confirm ?
The other folders are quite critical so can't mess with the rule to test.
I can confirm that you can delete at folder level instead of entire bucket. We have a rule that does the exact same thing (although 7 days instead of 1). I will echo John's point that after initial setup, it will take time to do the deletion. You should see progress STARTING within 1 hour, but actual completion may take a while.

Cheapest way to delete 2 billion objects from S3 IA

I have a bucket in S3 (Infrequent access) containing 2 billion objects. It is too big to delete in the console or over the api without taking years.
I can create a lifecycle rule to expire and delete the objects but the calculator predicts this will cost me >$20,000. Is that correct? Is there a better way to delete a bucket?
I have a file effectively containing a list of all the objects in that bucket if that helps.
Update 2021:
An answer below from #MAP points out that there is now an "Empty" button. I haven't tested yet, but looks like the way to go (I'll accept that answer once tested):
If you have a list of all the objects available then you can certainly use Multi Delete Object action. Apparently this API is free. I would create AWS Step Functions state machine to loop through the file and delete 1000 objects at a time. 1000 appears to be the limit.
It will take around 2M step function transactions to delete all the objects in the bucket. As per the pricing for step function it will cost you around $50 + cost of Lambda invocations around $1 so total cost roughly $51.
Update
Using Lambda or Step Functions is probably not the most cost effective option because both ways you will need to read the file (that contains object keys) from some source such as S3. So I think running the script from local machine or any EC2 linux screen appears to be the best option.
In 2021, anyone who comes across this question may benefit to know that AWS console now provides an empty button.
Select the bucket and click on "empty" button and all objects versioned or not versioned would be emptied/deleted. Depending on the number of objects it can take minutes to days.
Expiration lifecycle rules are free. From the original feature announcement:
As with standard delete requests, Amazon S3 doesn’t charge you for using Object Expiration.
Delete operations are for free. You can create a lifecycle
Policy to automate a bulk delete.
I would start with a small number of objects first and check billing report to 100% confirm that the delete will not be charged, then go for the rest.