Is there a way to make a Google Cloud Storage bucket "append-only"?
To clarify, I want to make it so that trying to overwrite/modify an existing object returns an error.
Right now the only way I see to do this is client-side, by checking if the object exists before trying to write to it, but that doubles the number of calls I need to make.
There are several Google Cloud Storage features that you can enable:
Object Versioning
Bucket Lock
Retention Policies
The simplest method is to implement Object Versioning. This prevents objects from being overwritten or deleted. This does require changes to client code to know how to request a specific version of an object if multiple versions have been created due to object overwrites and deletes.
Cloud Storge Object Versioning
For more complicated scenarios implement bucket lock and retention policies. These features allow you to configure a data retention policy for a Cloud Storage bucket that governs how long objects in the bucket must be retained
Retention policies and Bucket Lock
Related
I have some files in my AWS S3 bucket which i would like to put in Glacier Deep Archive from Standard Storage. After selecting the files and changing the storage class, it gives the following message.
Since the message says that it will make a copy of the files, my question is that will I be charged extra for moving my existing files to another storage class?
Thanks.
"This action creates a copy of the object with updated settings and a new last-modified date. You can change the storage class without making a new copy of the object using a lifecycle rule.
Objects copied with customer-provided encryption keys (SSE-C) will fail to be copied using the S3 console. To copy objects encrypted with SSE-C, use the AWS CLI, AWS SDK, or the Amazon S3 REST API."
Yes, changing the storage class incurs costs, regardless of whether it's done manually or via a lifecycle rule.
If you do it via the console, it will create a deep archive copy but will retain the existing one as a previous version (if you have versioning enabled), so you'll start being charged for storage both (until you delete the original version).
If you do it via a lifecycle rule, it will transition (not copy) the files, so you'll only pay for storage for the new storage class.
In both cases, you'll have to pay for LIST ($0.005 per 1000 objects in STANDARD class) and COPY/PUT ($0.05 per 1000 objects going to DEEP_ARCHIVE class) actions.
Since data is being moved within the same bucket (and therefore within the same region), there will be no data transfer fees.
The only exception to this pricing is the "intelligent tiering" class, which automatically shifts objects between storage classes based on frequency of access and does not charge for shifting classes.
No additional tiering fees apply when objects are moved between access tiers within the S3 Intelligent-Tiering storage class.
I'd like to use Terraform to move multiple GCS bucket objects from one bucket to another bucket to a different location.
I read through Terraform documentation but I couldn't find anything substantial.
Terraform for Cloud Storage provider only handles creation of object. What you can do as a workaround is to use Terraform with Storage Transfer Service which schedules a job that transfers multiple objects to a GCS bucket which either came from AWS S3 or another GCS.
Since this is a GCS to GCS transfer, you can take note of:
Under transfer spec block, only specify the gcs_data_source to indicate that it is a GCS to GCS transfer.
The schedule block specifies the time when the transfer will start. If you intend to execute it just once, you can specify the schedule_end_date immediately.
The Storage Transfer Service feature also offers guide through the Google Cloud Console should you want to try it out:
https://cloud.google.com/storage-transfer/docs/create-manage-transfer-console#configure
I'm writing a service that takes screenshots of a lot of URLs and saves them in a public S3 bucket.
Due to storage costs, I'd like to periodically purge the aforementioned bucket and delete every screenshot that hasn't been accessed in the last X days.
By "accessed" I mean downloaded or acquired via a GET request.
I checked out the documentation and found a lot of ways to define an expiration policy for an S3 object, but couldn't find a way to "mark" a file as read once it's been accessed externally.
Is there a way to define the periodic purge without code (only AWS rules/services)? Does the API even allow that or do I need to start implementing external workarounds?
You can use Amazon S3 Storage Class Analysis:
By using Amazon S3 analytics storage class analysis you can analyze storage access patterns to help you decide when to transition the right data to the right storage class. This new Amazon S3 analytics feature observes data access patterns to help you determine when to transition less frequently accessed STANDARD storage to the STANDARD_IA (IA, for infrequent access) storage class.
After storage class analysis observes the infrequent access patterns of a filtered set of data over a period of time, you can use the analysis results to help you improve your lifecycle policies.
Even if you don't use it to change Storage Class, you can use it to discover which objects are not accessed frequently.
There is no such service provided by AWS.. You will have to write your own solution.
We’ve been using Google Cloud Storage Transfer service and in our data source (AWS) we had a directory accidentally deleted, so we figured it would be in the data sink however upon taking a looking it wasn’t there despite versioning being on.
This leads us to believe in Storage Transfer the option deleteObjectsUniqueInSink hard deletes objects in the sink and removes them from the archive.
We'e been unable to confirm this in the documentation.
Is GCS Transfer Service's deleteObjectsUniqueInSink parameter in the TransferSpec mutually exclusive with GCS's object versioning soft-delete?
When the deleteObjectsUniqueInSink option is enabled, Google Cloud Storage Transfer will
List only the live versions of objects in source and destination buckets.
Copy any objects unique in the source to the destination bucket.
Issue a versioned delete for any unique objects in the destination bucket.
If the unique object is still live at the time that Google Cloud Storage Transfer issues the deletion, it will be archived. If another process, such as Object Lifecycle Management, archived the object before the deletion occurs, the object could be permanently deleted at this point rather than archived.
Edit: Specifying the version in the delete results in a hard delete (Objects Delete Documentation), so transfer service is currently performing hard deletes for unique objects. We will update the service to instead perform soft deletions.
Edit: The behavior has been changed. From now on deletions in versioned buckets will be soft deletes rather than hard deletes.
I have a task to control object lifecycle of particular objects in S3 bucket. Eg: most of the objects should expire and be deleted according to lifecycle policy, but for some objects I want the expiration never happen. In Amazon SQS there is a possibility to control lifecycle parameters of each single message, but I can't find such feature in docs for S3. Is it possible?
No, it isn't. Lifecycle policies apply to all the objects in the bucket, or all the objects with a matching prefix. You'd need to set the policy on a specific key prefix, and then store the objects you want to match the policy, using that prefix, but the other objects with a different prefix. That's the closest thing available, and it's not really all that close.