How to set metadata that created at certain time? - google-cloud-platform

I want to set metadata all object when created date is on 12 o'clock tonight. For now I just can set metadata for all objects that's already in a bucket with this command below :
gsutil -m setmeta -h "Content-Type:application/pdf" -h "Content-disposition: inline" gs://mystorage/pdf/*.pdf
My plan is to set all new object by run gsutil command in the midnight automatically because I already make a command witch upload all file from my server to google storage every midnight. But the only problem is I don't know witch file is new.
I know that we can use google cloud trigger but I just want to use gsutil command if it's possible

I think there is no feature that gsutil or the GCS API provides to set metadata for objects based on timestamp.
According to link At upload time you can specify one or more metadata properties to associate with objects.
As you mentioned I already make a command which uploads all files from my server to google storage every midnight you can set metadata while uploading objects. command may look like below in your case.
gsutil -m setmeta -h "Content-Type:application/pdf" -h "Content-disposition: inline" cp -r images gs://bucket/images
Or else
you can list the objects based on timestamp and store the output to a file. By iterating through each line in the outfile file use your setmetadata command for the objects.

Or, you can use Pub/Sub notifications for Cloud Storage, and subscribe to the new objects event OBJECT_FINALIZE.
Some sample code showing this can be referred here

Related

gsutil / gcloud storage list files by limits and pagination

Is there any way we can list files from GCS bucket with limits.
Say I have 2k objects in my bucket. But when I do gsutil ls, I only want the 1st 5 objects, not all.
How to achieve this.
Also is there any pagination available ?
gsutil ls gs://my-bucket/test_file_03102021* 2>/dev/null | grep -i ".txt$" || :
From looking at gsutil help ls, gsutil doesn't currently have an option to limit the number of items returned from an ls call.
While you could pipe the results to something like awk to get only the first 5 items, that would be pretty wasteful if you have lots of objects in your bucket (since gsutil would continue making paginated HTTP calls until it listed all N of your objects).
If you need to do this routinely on a bucket with lots of objects, you're better off writing a short script that uses one of the GCS client libraries. As an example, check out the google-cloud-storage Python library -- specifically, see the list_blobs method, which accepts a max_results parameter.
There is a pagination available when you use the API directly. If you want only the 5 first objects and you use gsutil, you will have to wait the full answer of hundreds (thousands, millions,...) of files before getting only the first 5.
If you use the API you can do this
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://storage.googleapis.com/storage/v1/b/<BUCKET_NAME>/o?alt=json&&maxResults=5" \
| jq .items[].name
Of course, you can change the max results size
You can include prefix also when you filter. More detail in the API documentation

How to delete GCP Storage notification programmatically?

In my bash script I want to recreate GCS notification create by:
gsutil notification create -f json -t <topic> -p <prefix> gs://<bucket>
In case I'll try to call this line again, it will create one more (the same) notification.
In order to delete the notification I need:
gsutil notification delete projects/_/buckets/<bucket>/notificationConfigs/<config-id>
config-id is the identifier returned when the notification is created. also, it can be retrieved with:
gsutil notification list gs://<bucket>
The output of list call is similar to:
projects/_/buckets/<bucket>/notificationConfigs/<config-id>
Cloud Pub/Sub topic: projects/<project>/topics/<topic>
Filters:
Object name prefix: '<project>'
This config-id does not look like something to parse easily in Shell.
Is there a normal way to manage notifications? Or can I create notifications with no duplicates (so the second create call will not create a new notification, but update the existent)?
If you use the CLI, it's the normal way. If you use the debug command, you have a id field, but not sure that the response be easier to parse
gsutil -D notification list gs://<bucket>
You can also use the REST API of Google Cloud Storage for the notifications
In the list endpoint, you have a notification description with the ID of the notification, easy to get this time.
Finally, you can use the client library (here in Python for example) where you have handy method, like an Exist, to be sure to not create twice the same subscription.

How to find number of records in a csv file placed in google cloud storage without downloading the file

I have tried this but system tells 'wc' is not a valid command.
gsutil wc -l gs://folder/test.csv
please help me how to find no of records in a file without downloading it
i have tried this and it is working
gsutil cat gs://folder/test.csv | wc -l
Cloud Storage doesn't provide any computing resources to deal with contents of an object. The only things you can do are upload objects, download objects, or read/write metadata associated with an object. There is operation to count lines or do anything else with the contents of an object.
Your choices are to either download the object and count the lines on the client, or count the lines before uploading the object, and attaching that to metadata so that it can be easily discovered without requiring a full download.

How to perform files integrity check between Amazon-S3 and Google Cloud Storage

I am migrating my data from Amazon-S3 to Google-Cloud Storage.
I have copied my data using gsutil:
$ gsutil cp -R s3://my_bucket/* gs://my_bucket
What I want to do next is to check if all the files in S3 is properly exist in Google Storage.
At the moment all I did is to do print file list in file and then do simple Unix diff but that doesn't really check the file integrity.
What's the good way to check that?
gsutil verifies MD5 checksums on objects copied between cloud providers, so if the recursive copy command completes successfully (shell return code 0), you should have copied everything successfully. Note that gsutil isn't able to compare checksums for S3 objects larger than 5 GiB (which have a non-MD5 checksum that gsutil doesn't support), and will print a warning for cases it encounters.

Sync command for OpenStack Object Storage (like S3 Sync)?

Using the S3 CLI, I can sync a local directory with an S3 bucket using the following command:
aws s3 sync s3://mybucket/ ./local_dir/
This command is a complete sync. It uploads new files, updates changed files, and deletes removed files. I am trying to figure out how to do something equivalent using the OpenStack Object Storage CLI:
http://docs.openstack.org/cli-reference/content/swiftclient_commands.html
The upload command has a --changed option. But I need a complete sync that is also capable of deleting local files that were removed.
Does anyone know if I can do something equivalent to s3 sync?
The link you mentioned has this :
`
objects –
A list of file/directory names (strings) or SwiftUploadObject instances containing a source for the created object, an object name, and an options dict (can be None) to override the options for that individual upload operation`
I'm thinking, if you pass the directory and the --changed option it should work.
I don't have a swift to test with. Can you try again?