How to change storage class In S3 the fastest way - amazon-web-services

I have around 7 TB of data in a folder in Amazon S3. I want to change the storage class from standard to one zone IA. But when it's done via UI its taking too long, might even take whole day. What's the fastest way to change the storage class?

You can create a Lifecycle Policy for an S3 Bucket.
This can automatically change the storage class for objects older than a given number of days.
So, this is the "fastest" way for you to request the change.
However, the Lifecycle policy might take up to 24-48 hours to complete, so it might not be the "fastest" to have all the objects transitioned.

You can do it different ways:
Via the console as you experienced
Via lifecycle management
Via AWS cli
Via AWS SDK (if you know any of the programming language)
You can also change the storage class of an object that is already stored in Amazon S3 to any other storage class by making a copy of the object using the PUT Object - Copy API.
You copy the object in the same bucket using the same key name and specify request headers as follows:
Set the x-amz-metadata-directive header to COPY.
Set the x-amz-storage-class to the storage class that you want to use.
In a versioning-enabled bucket, you cannot change the storage class of a specific version of an object. When you copy it, Amazon S3 gives it a new version ID.
Option 4 would be the fastest way in my case (as a developer). Looping through all the objects and copy them with the correct storage class.
Hope it helps!

Related

AWS storage class change

I have some files in my AWS S3 bucket which i would like to put in Glacier Deep Archive from Standard Storage. After selecting the files and changing the storage class, it gives the following message.
Since the message says that it will make a copy of the files, my question is that will I be charged extra for moving my existing files to another storage class?
Thanks.
"This action creates a copy of the object with updated settings and a new last-modified date. You can change the storage class without making a new copy of the object using a lifecycle rule.
Objects copied with customer-provided encryption keys (SSE-C) will fail to be copied using the S3 console. To copy objects encrypted with SSE-C, use the AWS CLI, AWS SDK, or the Amazon S3 REST API."
Yes, changing the storage class incurs costs, regardless of whether it's done manually or via a lifecycle rule.
If you do it via the console, it will create a deep archive copy but will retain the existing one as a previous version (if you have versioning enabled), so you'll start being charged for storage both (until you delete the original version).
If you do it via a lifecycle rule, it will transition (not copy) the files, so you'll only pay for storage for the new storage class.
In both cases, you'll have to pay for LIST ($0.005 per 1000 objects in STANDARD class) and COPY/PUT ($0.05 per 1000 objects going to DEEP_ARCHIVE class) actions.
Since data is being moved within the same bucket (and therefore within the same region), there will be no data transfer fees.
The only exception to this pricing is the "intelligent tiering" class, which automatically shifts objects between storage classes based on frequency of access and does not charge for shifting classes.
No additional tiering fees apply when objects are moved between access tiers within the S3 Intelligent-Tiering storage class.

How to make a Google Cloud Storage bucket append-only?

Is there a way to make a Google Cloud Storage bucket "append-only"?
To clarify, I want to make it so that trying to overwrite/modify an existing object returns an error.
Right now the only way I see to do this is client-side, by checking if the object exists before trying to write to it, but that doubles the number of calls I need to make.
There are several Google Cloud Storage features that you can enable:
Object Versioning
Bucket Lock
Retention Policies
The simplest method is to implement Object Versioning. This prevents objects from being overwritten or deleted. This does require changes to client code to know how to request a specific version of an object if multiple versions have been created due to object overwrites and deletes.
Cloud Storge Object Versioning
For more complicated scenarios implement bucket lock and retention policies. These features allow you to configure a data retention policy for a Cloud Storage bucket that governs how long objects in the bucket must be retained
Retention policies and Bucket Lock

Approach to move file from s3 to s3 glacier

I need to create a python flask application that moves a file from s3 storage to s3 glacier. I cannot use the lifetime policy to do this as I need to use glacier vault lock which isn't possible with the lifetime policy method since I won't be able to use any glacier features on those files. The files will be multiple GBs in size so I need to download these files and then upload them on glacier. I was thinking of adding a script on ec2 that will be triggered by flask and will start downloading and uploading files to glacier.
This is the only solution I have come up with and it doesn't seem very efficient but I'm not sure. I am pretty new to AWS so any tips or thoughts will be appreciated.
Not posting any code as I don't really have a problem with the coding, just the approach I should take.
It appears that your requirement is to use Glacier Vault Lock on some objects to guarantee that they cannot be deleted within a certain timeframe.
Fortunately, similar capabilities have recently been added to Amazon S3, called Amazon S3 Object Lock. This works at the object or bucket level.
Therefore, you could simply use Object Lock instead of moving the objects to Glacier.
If the objects will be infrequently accessed, you might also want to change the Storage Class to something cheaper before locking it.
See: Introduction to Amazon S3 Object Lock - Amazon Simple Storage Service

AWS S3 deletion of files that haven't been accessed

I'm writing a service that takes screenshots of a lot of URLs and saves them in a public S3 bucket.
Due to storage costs, I'd like to periodically purge the aforementioned bucket and delete every screenshot that hasn't been accessed in the last X days.
By "accessed" I mean downloaded or acquired via a GET request.
I checked out the documentation and found a lot of ways to define an expiration policy for an S3 object, but couldn't find a way to "mark" a file as read once it's been accessed externally.
Is there a way to define the periodic purge without code (only AWS rules/services)? Does the API even allow that or do I need to start implementing external workarounds?
You can use Amazon S3 Storage Class Analysis:
By using Amazon S3 analytics storage class analysis you can analyze storage access patterns to help you decide when to transition the right data to the right storage class. This new Amazon S3 analytics feature observes data access patterns to help you determine when to transition less frequently accessed STANDARD storage to the STANDARD_IA (IA, for infrequent access) storage class.
After storage class analysis observes the infrequent access patterns of a filtered set of data over a period of time, you can use the analysis results to help you improve your lifecycle policies.
Even if you don't use it to change Storage Class, you can use it to discover which objects are not accessed frequently.
There is no such service provided by AWS.. You will have to write your own solution.

AWS S3 How do I enable S3 object encryption for object that existed before

Had a series of buckets that did not have encryption turned on. boto3 code to turn it on easy. Just using basic AES256.
Unfortunately any object that already exists will not have server side encryption set. Been looking at the API and cannot find the call to change the attribute. Via the console, it is there. But i am not about to do that with 10000 objects.
Not willing to copy that much data out and then back in again.
The s3 object put looks like it expects to write an object. Does not seem to update an object.
Anyone willing to offer a pointer?
Amazon S3 has the ability to do a COPY operation where the source file and the destination file are the same (in object name only). This copy operation happens on S3, which means that you do not need to download and reupload the file.
To turn on encryption for a file, called Server Side Encryption (SSE AES-256), you can use the AWS CLI COPY command:
aws s3 cp s3://mybucket/myfile.zip s3://mybucket/myfile.zip --sse
The source file will be copied to the destination (notice the same object names) and SSE will be enabled (the file will be encrypted).
If you have a list of files, you could easily create a batch script to process each file.
Or you could write a simple python program to scan each file on S3 and if SSE is not enabled, encrypt with the AWS CLI command or with python S3 APIs.
I've been reading and talking to friends. I tried something for the heck of it.
aws s3 cp s3://bucket/tools/README.md s3://bucket/tools/README.md
Encryption was turned on. Is AWS smart enough to recognize this and it just applied encryption bucket policy? Or did it really recopy of object on top of itself?
You can do something like this to copy object between buckets and encrypt them.
But coping is not without any side effects, in order to understand what is behind coping we have to see the S3 user guide.
Each object has metadata. Some of it is system metadata and other user-defined. Users control some of the system metadata such as storage class configuration to use for the object, and configure server-side encryption. When you copy an object, user-controlled system metadata and user-defined metadata are also copied. Amazon S3 resets the system controlled metadata. For example, when you copy an object, Amazon S3 resets creation date of copied object. You don't need to set any of these values in your copy request.
You can find more about metadata from here
Note that if you choose to update any of the object's user configurable metadata (system or user-defined) during the copy, then you must explicitly specify all the user configurable metadata, even if you are only changing only one of the metadata values, present on the source object in your request.
You will also have to pay for copy requests. However there won't be any charge for delete requests. Since there is no need to copy object between regions in this case you wont be charge for bandwidth.
So keep these in mind when you are going ahead with copy object in S3.