Google Cloud Storage Transfer - google-cloud-platform

We’ve been using Google Cloud Storage Transfer service and in our data source (AWS) we had a directory accidentally deleted, so we figured it would be in the data sink however upon taking a looking it wasn’t there despite versioning being on.
This leads us to believe in Storage Transfer the option deleteObjectsUniqueInSink hard deletes objects in the sink and removes them from the archive.
We'e been unable to confirm this in the documentation.
Is GCS Transfer Service's deleteObjectsUniqueInSink parameter in the TransferSpec mutually exclusive with GCS's object versioning soft-delete?

When the deleteObjectsUniqueInSink option is enabled, Google Cloud Storage Transfer will
List only the live versions of objects in source and destination buckets.
Copy any objects unique in the source to the destination bucket.
Issue a versioned delete for any unique objects in the destination bucket.
If the unique object is still live at the time that Google Cloud Storage Transfer issues the deletion, it will be archived. If another process, such as Object Lifecycle Management, archived the object before the deletion occurs, the object could be permanently deleted at this point rather than archived.
Edit: Specifying the version in the delete results in a hard delete (Objects Delete Documentation), so transfer service is currently performing hard deletes for unique objects. We will update the service to instead perform soft deletions.
Edit: The behavior has been changed. From now on deletions in versioned buckets will be soft deletes rather than hard deletes.

Related

Google Cloud storage bucket not listing deleted objects

Two days after having manually deleted all the objects in a multi-region Cloud Storage bucket (e.g. us.artifacts.XXX.com) without Object Versioning I noticed that the bucket size hadn't decreased at all. Only when trying to delete the bucket I discovered that it actually stills containing the objects that I had presumably deleted.
Why aren't those objects displayed in the bucket list view, even when enabling Show deleted data?
When deploying a Function for the first time, two buckets are created automatically:
gcf-sources-XXXXXX-us-central1
us.artifacts.project-ID.appspot.com
You can observe these two buckets from the GCP Console by clicking on Cloud Storage from the left panel.
The files you're seeing in bucket us.artifacts.project-ID.appspot.com are related to a recent change in how the runtime (for Node 10 and up) is built as this post explains.
I also found out that this bucket doesn't have object versioning, retention policy or any lifecycle rule. Although you delete this bucket, it will be created again when you deploy the related function, so, if you are seeing unexpected amounts of Cloud Storage used, this is likely caused by a known issue with the cleanup of artifacts created in the function deployment process as indicated here.
Until the issue is resolved, you can avoid hitting storage limits by creating an auto-deletion rule in the Cloud Console:
In the Cloud Console, select your project > Storage > Browser to open the storage browser.
Select the "artifacts" bucket from the list.
Under the Lifecycle tab, add a rule to auto-delete old images. Choose a deletion interval that works within your normal rate of deployments.
If possible, try to reproduce this scenario with a new function. In the meantime, take into account that if you delete many objects at once, you can track deletion progress by clicking the Notifications icon in the Cloud Console.
In addition, the Google Cloud Status Dashboard provides information about regional or global incidents affecting Google Cloud services such as Cloud Storage.
Nevermind! Eventually (at some point between 2-7 days after the deletion) the bucket size decreased and the objects are no longer displayed in the "Delete bucket" dialog.

AWS storage class change

I have some files in my AWS S3 bucket which i would like to put in Glacier Deep Archive from Standard Storage. After selecting the files and changing the storage class, it gives the following message.
Since the message says that it will make a copy of the files, my question is that will I be charged extra for moving my existing files to another storage class?
Thanks.
"This action creates a copy of the object with updated settings and a new last-modified date. You can change the storage class without making a new copy of the object using a lifecycle rule.
Objects copied with customer-provided encryption keys (SSE-C) will fail to be copied using the S3 console. To copy objects encrypted with SSE-C, use the AWS CLI, AWS SDK, or the Amazon S3 REST API."
Yes, changing the storage class incurs costs, regardless of whether it's done manually or via a lifecycle rule.
If you do it via the console, it will create a deep archive copy but will retain the existing one as a previous version (if you have versioning enabled), so you'll start being charged for storage both (until you delete the original version).
If you do it via a lifecycle rule, it will transition (not copy) the files, so you'll only pay for storage for the new storage class.
In both cases, you'll have to pay for LIST ($0.005 per 1000 objects in STANDARD class) and COPY/PUT ($0.05 per 1000 objects going to DEEP_ARCHIVE class) actions.
Since data is being moved within the same bucket (and therefore within the same region), there will be no data transfer fees.
The only exception to this pricing is the "intelligent tiering" class, which automatically shifts objects between storage classes based on frequency of access and does not charge for shifting classes.
No additional tiering fees apply when objects are moved between access tiers within the S3 Intelligent-Tiering storage class.

How to make a Google Cloud Storage bucket append-only?

Is there a way to make a Google Cloud Storage bucket "append-only"?
To clarify, I want to make it so that trying to overwrite/modify an existing object returns an error.
Right now the only way I see to do this is client-side, by checking if the object exists before trying to write to it, but that doubles the number of calls I need to make.
There are several Google Cloud Storage features that you can enable:
Object Versioning
Bucket Lock
Retention Policies
The simplest method is to implement Object Versioning. This prevents objects from being overwritten or deleted. This does require changes to client code to know how to request a specific version of an object if multiple versions have been created due to object overwrites and deletes.
Cloud Storge Object Versioning
For more complicated scenarios implement bucket lock and retention policies. These features allow you to configure a data retention policy for a Cloud Storage bucket that governs how long objects in the bucket must be retained
Retention policies and Bucket Lock

AWS S3 deletion of files that haven't been accessed

I'm writing a service that takes screenshots of a lot of URLs and saves them in a public S3 bucket.
Due to storage costs, I'd like to periodically purge the aforementioned bucket and delete every screenshot that hasn't been accessed in the last X days.
By "accessed" I mean downloaded or acquired via a GET request.
I checked out the documentation and found a lot of ways to define an expiration policy for an S3 object, but couldn't find a way to "mark" a file as read once it's been accessed externally.
Is there a way to define the periodic purge without code (only AWS rules/services)? Does the API even allow that or do I need to start implementing external workarounds?
You can use Amazon S3 Storage Class Analysis:
By using Amazon S3 analytics storage class analysis you can analyze storage access patterns to help you decide when to transition the right data to the right storage class. This new Amazon S3 analytics feature observes data access patterns to help you determine when to transition less frequently accessed STANDARD storage to the STANDARD_IA (IA, for infrequent access) storage class.
After storage class analysis observes the infrequent access patterns of a filtered set of data over a period of time, you can use the analysis results to help you improve your lifecycle policies.
Even if you don't use it to change Storage Class, you can use it to discover which objects are not accessed frequently.
There is no such service provided by AWS.. You will have to write your own solution.

Amazon S3 consistency model for new objects

Looking at https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html,
Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write.
My understanding from this is that if I create a new object and I haven't checked for its existence beforehand, it should be available immediately (e.g., show up in list requests).
But the above link also says
...you might observe the following behaviors:
A process writes a new object to Amazon S3 and immediately lists keys within its bucket. Until the change is fully propagated, the object might not appear in the list.
which contradicts the first statement as it basically says read-after-write consistency is always eventual for PUTS.
I read it as:
Amazon guarantees that a ReadObject request (GET and HEAD) will succeed (read-after-write consistency) for any newly PUT object, assuming you haven't requested the object before
Amazon doesn't guarantee that a ListBucket request will be immediately consistent for any newly PUT object, but rather the new object will eventually show up in a ListBucket request (eventual consistency)
S3 is now strongly consistent, all S3 GET, PUT, and LIST operations, as well as operations that change object tags, ACLs, or metadata, are now strongly consistent. What you write is what you will read, and the results of a LIST will be an accurate reflection of what’s in the bucket. This applies to all existing and new S3 objects, works in all regions, and is available to you at no extra charge! There’s no impact on performance, you can update object hundreds of times per second if you’d like, and there are no global dependencies.
https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/