I am aware of similar concept in AWS cloud where a bucket can hold multiple storage class objects like Standard object and Coldline object.
I tried googling about the same in GCP since the objects that I will have, need to be of different Storage Class objects since they won't be accessed frequently.
Yes, GCS can hold multiple storage class objects within a bucket. Refer this documents DOC1.
DOC2 for detailed steps and explanation to change the storage class of indvidual object within a bucket.
Moreover there are multiple storage classes available in GCP like
Standard - A noraml storage class which can be used in frequent
operations.
Nearline - Nearline is recommended to use when the data that needs to be accessed on average once every 30 days or less.
Coldline - Coldline can be used for infrequent data which needs to be accessed on average once per quarter i.e, 90 days.
Archive - Archive is the best storage plan when the data needs to be accessed once once per year i.e, 365 days
Note: The pricing of storage class differs from each one based on the type you choose.
For more detailed information refer to these documents DOC1 DOC2.
Yes. You can set the storage classes in a number of ways:
First, when you upload an object, you can specify its storage class. It's a property of most the client library "write" or "upload" methods. If you're using the JSON API directly, check the storageClass property on the objects.insert call. If you're using the XML API, use the x-goog-storage-class header.
Second, you can also set the "default storage class" on the bucket, which will be used for all object uploads that do not specify a class.
Third, you can change an object's storage class using the objects.rewrite call. If you're using an API like the Python API, you can use a function like blob.update_storage_class(new_storage_class) to change the storage class (note that this counts as an object write).
Finally, you can put "lifecycle policies" on your bucket that will automatically transition storage classes for individual objects over time or in response to some change. For example, you could have a rule like "downgrade an object's storage class to coldline 60 days after its creation." See https://cloud.google.com/storage/docs/lifecycle for more.
Full documentation of storage classes can be found at : https://cloud.google.com/storage/docs/storage-classes
Related
I want to change the storage class of gcp bucket existing object based on their access pattern like number of download. I got this link:
https://cloud.google.com/storage/docs/lifecycle
which is based on the object creation time. Is there any way to achieve the same based on the download pattern,
I am looking for a way to update several objects ACL in one (of few request) to the AWS API.
My web application contains several sensitive objects stored in AWS S3. This object have a default ACL to "private". I sometimes need to update several objects ACL to "public-read" for some time (a couple of minutes) before going back to "private".
For a couple of objects, one request per object to PutObjectAcl is ok. But when dealing with several objects (hundreds), the operation requires to much time.
My question is : how can I "mass put object acl" or "bulk put object acl" ? The AWS API doesn't contain a specific answer, like DeleteObjects (which allows to delete several objects at once). But may be I didn't look in the right place ?!
Any tricks or way to work around that would be of great value !
Mixing private and public objects inside a bucket is usually a bad idea. If you only need those objects to be public for a couple of minutes, you can create a pre-signed GET URL and set a desired expiration time.
My requirement is to move the files to archive, once the (current time - last access time) is greater than a specific value. Is such an option possible?
I went through the documentation, but, did not see any storage class change option based on last accessed timestamp.
You can use lifecycle on Cloud Storage to change the storage class based on temporal conditions.
Google's lifecycle has an option condition called "Days since custom time".
Presumably you could set the custom time whenever you access an object and this would work.
When using the bucket.getFilesStream, which auto-paginates through the files in a bucket, is each page's worth of data request considered single Class A operations? Or is the entire stream using pagination considered a single Class A operation?
If it's multiple operations, is there a cheaper way to get a list of all files in a bucket, assuming there are millions of files?
According to the official Cloud Storage JSON API reference the method for listing the bucket objects is storage.objects.list. It retrieves a list of objects matching the specified criteria. This is the method used in the client libraries to retrieve list of object in the bucket. As long as this is the only method to achieve this, there isn't any workaround to list the buckets objects in a cheaper way.
As you can see in the Google Cloud Storage pricing documentation a call to this method is considered as a Class A operation. The number of calls would depend on how the node.js uses the JSON API.
We're looking into Google Nearline as a solution for some "warm" storage requirements. Basically we expect parts of a dataset of around 5 PB to be accessed every now and again, but the whole set very infrequently.
That said, there may be one or two times a year we want to run something across the whole dataset (ie patch all the data with a new field). These algorithms would run within GCP (dataproc). Doing this on nearline blows up our budget 50k per time.
Wondering if there are possibilities of changing the storage class without incurring the full data retrieval penalty? I see that a storage class can be changed vi a gsutil rewrite but this will retrieve the data.
Perhaps we can use a lifecycle rule to change the storage class without a retrieval? Or is there any other way to do it?
The gsutil rewrite as an operation will end up creating new objects on the storage class which means you read GCS objects in one storage object class and write in another (i.e. new objects get created)
This operation is charged to your project.