Permanently delete all delete marked objects in versioned S3 bucket - amazon-web-services

I have a S3 bucket which is version enabled, I want to permanently delete all the delete marked version objects from the S3 bucket using lifecycle rule.
Which of the below options we need to choose, in order to permanently delete the versions of the objects.
And also the delete marked objects may also be current version.

Let's understand what is non current version and current version?
Whenever versioned bucket object is deleted current object version becomes noncurrent, and the delete marker becomes the current version.
what is expired delete marker?
A delete marker with zero noncurrent versions is referred to as an expired object delete marker.
So option 4 and 5 will solve your purpose
option 4 will permananelty delete non current objects, which will make delete marker as expired since there will no non current version
option 5 will delete expired delete markers
Note: Lifecycle rule policies takes time to take effect as objects are queued and it happens in an asynchronous manner.

Related

S3 Versioning and Lifecycle rules - Will "Expire current versions of objects" also delete the delete markers?

I am having trouble understanding the relationship between Delete markers and Lifecycle rules in S3 with versioning.
Here is my lifecycle rule:
Does this mean that the Expire current versions of objects option would delete a delete marker after it becomes the "current" version? AKA 180 days after the actual current version is deleted?
From what I understand this would mean:
After 180 days the current version of example.txt would expire, a delete marker is created for it and the current version becomes a noncurrent version attached to the delete marker
Another 180 days later, the noncurrent example.txt would be permanently deleted and the "current" version (the delete marker) would "expire", and since it is a delete marker that means it would be permanently deleted as well
Is this an accurate understanding or do I need to make an additional lifecycle rule that deletes expired delete markers?
Thank you!
Yes you are correct
Delete marker on current version makes it non current version
After another 180 days, since you are deleting non current version it will Alsop delete delete markers of non current version too.
An expired object delete marker is removed if all noncurrent versions of an object expire after deleting a versioned object.
That is why you cannot expire current version and delete expire object markers together.
Refer Removing expired object delete markers , and understanding delete markers for more detail

Is it Possible to Delete AWS S3 Objects Based on Object Size

I don't seem to find any document about deleting S3 objects based on the object size. For example if an object size is less that 5B then delete it.
From your comments, it appears you want to delete objects immediately after creation if they are smaller than a given size.
To do this, you would:
Create an AWS Lambda function
Configure the S3 bucket to trigger the Lambda function when an object is created
The Lambda function will be passed the Bucket and Key of the object(s) that was/were just created. It can then call HeadObject to obtain the size of the object. If it is smaller than the desired size, it can then call DeleteObject. Make sure to loop through all passed-in Records because one Lambda function can be invoked with multiple input objects.
If you have existing objects on which you wish to perform this operation, and since you mentioned that there are "over 1 million objects", you could use Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects, including their size. You can write a program that uses this file as input and the calls DeleteObjects to delete up to 1000 objects at a time.
Yes, it is possible to delete an S3 Object based on size.
One workaround is to get the Object size of the S3 bucket via AWS CLI ( you can use cli or boto3 ) and performing a cron job that will perform the condition when true if the object size is less than 5B.
The DeleteObject() API call does not accept parameters such as Size or ModifiedDate.
Instead, you must provide a list of object(s) to be deleted.
If you wish to delete objects based on their size, the typical pattern would be:
Call ListObjets() to obtain a listing of objects in the bucket (and optionally in a given prefix)
In your code, loop through the returned information and example the object size. Where the size is smaller/larger than desired, add the Key (filename) to an array
Call DeteleObject(), passing the array of Keys to be deleted

Create lifecycle rule to delete ONLY previous versions of S3 Objects

I am trying to s 1) Enable the versioning on S3 buckets and 2) Delete previous versions after say 30 days. Do you know which lifecycle rule should I be setting to achieve 2?
One of the rule is Permanently delete previous versions of objects. Under that rule, you need to set Number of days after objects become previous version. The language in the public doc is not clear. Does that number mean after those number of days, the current S3 object becomes previous and gets deleted? In that case I will loose the S3 objects right?
Can someone help if my above understanding is correct?
Which rule should I set so that current version will be intact and only previous versions to be deleted after 30 days?
I looked at these examples, but all of them attempts to simply delete any S3 object that is older than 30 days. But I am trying to delete only the previous versions of the object.
Examples
1: Deleting old object versions on AWS S3
2: AWS: Delete Permanently S3 objects less than 30 days using 'Lifecycle Rule'
Thanks,
Pavan
You would use "Permanently delete previous versions of objects".
You would then enter "Number of days after objects become previous versions", which tells it to delete the object after than many days from when the version was not the Current version.
A Version will only ever become a Previous (non-current) version if a new version of the object is uploaded with the same name.
Does that number mean after those number of days, the current S3 object becomes previous and gets deleted
The current version will remain the current version until a user (eg You!) uploads a file with the same name. That will become the current version, and the "Current" version that was previously there becomes a "Previous" version.

How snowflake internally performs updates?

As far as I know, underlying files (columnar format) is immutable. My question is, if files are immutable, how the updates are being performed. Do Snowflake maintains different versions of the same row, and returns the latest version based on key? or it inserts the data into new files behind the scene and deletes old files? How performance gets affected in these scenarios (querying current data), if time travel is set to 90 days as Snowflake need to maintain different version of the same row. But as Snowflake doesn't respect keys, how even different versions are detected. Any insights (document/video) on the detailed internals is appreciated.
It's a complex question, but a basic ideas are as follows (quite a bit simplified):
records are stored in immutable micro-partitions on S3
a table is a list of micro-partitions
when a record is modified
its old micro-partition is marked as inactive (from that moment),
a new micro-partition is created, containing the modified record, but also other records from that micro-partition.
the new micro-partition is added to the table's list (marked as active from that moment)
inactive micro-partitions are not deleted for some time, allowing time-travel
So Snowflake doesn't need a record key, as each record is stored in only one file active at a given time.
The impact of performing updates on querying is marginal, the only visible impact might be that the files need to be fetched from S3 and cached on the warehouses.
For more info, I'd suggest going to Snowflake forums and asking there.

Sitecore - delete bucket subitems

I have items importer which deletes all item subitems and creates new subitems. Recently I switched it to buckets and now I have a problem with deleting.
I delete items using:
myItem.DeleteChildren();
Without bucket it took about 20min. Now it takes about 1h for 5k items. Do I need to revert bucket item before deleting and then synchronize again?
What is the quickest way to delete bucketable items?
My guess is, your deletion takes longer now because the bucket is updating indexes while deleting. While you could speed it up by disabling events around your .DeleteChildren call, you would still need to get those indexes updated for your bucket to function properly.
So to answer your question, there isn't a way to speed it up while still retaining full functionality.
If you want to test this in action, try the following:
using(new EventDisabler()) myItem.DeleteChildren();
It should bring the deletion speed up to where it was before, but at the price of a bucket that will not work properly, until indexes have been built.
I would recommend you adapt an integration approach, where complete deletion and rebuilding of your item store is not required.
You could take it a step further. I was able to import 30k in minutes by disabling all three.
using (new Sitecore.SecurityModel.SecurityDisabler())
{
using (new Sitecore.Data.Events.EventDisabler())
{
using (new ProxyDisabler())
{
//delete code here
}
}
}
If you have a lot of items in the bucket, it will execute events on each item that is being deleted. Put your delete code inside an event disabler:
using (new EventDisabler())
{
myItem.DeleteChildren();
}
That will stop all the events from firing and should be considerably quicker. As a caveat - the indexes will not be updated instantly when doing this, so you might want to run a index update on the master DB after your importer has run.
Another option would be to update existing items in the import rather than deleting all the items first.
Maybe (programatically) unbucket before calling DeleteChildren() ? I don't know how long that will take in total but might be quicker than it is now