Find the sub directories sizes of GCP bucket - google-cloud-platform

I have an gcp bucket and wanted to find the sizes of each sub directories with a depth of 2.
For example:
GCP bucket is : gs://gcp-bucket/
It has folders under it, in the below way
gs://gcp-bucket/city/a/
gs://gcp-bucket/city/b/
gs://gcp-bucket/city/c/
gs://gcp-bucket/city/d/
I wanted to find the size of each of the above folder structures from the Linux server. There are files under those folders but I want to find only the directory size but not the size of files under it.
I tried with gsutil commands but it is not working.

The option isn't available with gsutil du .... You need to build it.
In fact the reason is simple: directory doesn't exists in Cloud Storage. Only the files that matches the same prefix (the "directory path") are presented in the same "group" or "folder"

Related

AWS configuration to delete files

I have a folder "Execution" folder in s3 bucket.
It has folders and files like
Execution
Exec_06-06-2022/
file1.json
file2.json
Exec_07-06-2022/
file3.json
file4.json
I need to configure to delete the Exec_datestamp folders and the inside files after X days.
I tried this using AWS lifecycle config for the prefix "Execution/"
But it deletes the folder Execution/ after X days ( set this to 1 day to test)
Is there any other way to achieve this?
There are no folders in S3. Execution is part of objects name, specifically its key prefix. The S3 console only makes Execution to appear as a folder, but there is no such thing in S3. So your lifecycle deletes Execution/ object (not folder), because it matches your query.
You can try with Execution/Exec* filter.
Got it, https://stackoverflow.com/a/41459761/5010582. The prefix has to be "Execution/Exec" without any wildcard.

AWS S3 - Use powershell to delete all files but keep the folders

I have a powershell script, that downloads all files form an S3 bucket, and then removes the files from the bucket. All the files I'm removing are stored in a subfolder in the S3 bucket, and I just want to delete the files but maintain the subfolders.
I'm currently using the following command to delete the files in S3 once the file has been downloaded from S3.
Remove-S3Object -BucketName $S3Bucket -Key $key -Force
My problem is that if it removes all the files in the subfolder, the subfolder is removed as well. Is there a way to remove the file, but keep the subfolder present using powerhsell. I believe I can do something like this,
aws s3 rm s3://<key_to_be_removed> --exclude "<subfolder_key>"
but not quite sure if that'll work.
I'm looking for the best way to accomplish this, and at the moment, my only option is to recreate the subfolder via the script if the subfolder not longer exists.
The only way to accomplish having an empty folder is to create a zero-length object which has the same name as the folder you want to keep. This is actually how the S3 console enables you to create an empty folder.
You can check this by running $ aws s3 ls s3://your-bucket/folderfoo/ and observing an output object having length of zero bytes.
See more on this topic here.
As already commented, S3 does not really have folders the way file systems do. The folders as presented by most S3 browsers are just generated based on the paths of the files/objects. If you upload an object/file named folder/file, the browsers will present folder as folder with file as a file in the folder. But technically, all that there is is the file/object folder/file. The folder does not exist on its own.
You can explicitly create a folder by creating an empty empty-named object with "in the folder": folder/. If you do that, it will appear the the folder exists even if there are no files in it. But if you do not do that, the virtual folder disappears once you remove all objects in the folder.
Now the question is whether your command removes even the empty named object representing the folder or not. I cannot tell that.

Assist With AWS CLI S3 Bucket/Folder File Search

I have several folders and only certain folders contain files with ".003" at the end of their name. These files do not have an extension.
I am interested in finding out:
The name of the containing folder any of those files ARE in (inside of the bucket) and possibly listed only once (no duplication)?
The name of the containing folder that those files are NOT in?
I know how to do a search for a file like so:
aws s3 ls s3://{bucket}/{folder1}/{folder2} --recursive |grep "\.003"
Are there CLI commands that can give me what I am looking for?
If this or something like this has been asked before please point me in the correct direction. My apologies if so! :)
Thank you for your time!

How to copy only files from many subdirectory under the directory to another project bucket in GCP?

I have huge number of data in my Google Cloud storage bucket. I have to copy all the files to another project bucket. But the main problem is, in this bucket i created some folder and under this folder have many sub-folders and all sub-folders have data. So when i am using normal gsutil copy command then its copying all the data along with folders.
I need help to resolve this problem. Because it is taking too much time to copy from one project to another project bucket.
You can use this command to have all the files in the root path.
gsutil cp 'gs://[YOUR_FIRST_BUCKET_NAME]/*' gs://[YOUR_SECOND_BUCKET_NAME]
If you have nested directories inside your bucket, use this command:
gsutil cp -r 'gs://[YOUR_FIRST_BUCKET_NAME]/*' gs://[YOUR_SECOND_BUCKET_NAME]
Pay attention to single quotes around the first command.
You can take a look at the Wildcard Names if you need more advanced features.
You can use Google Data Transfer Service
It is the second option in the Google Cloud Storage subcategory.
Use gsutil cp command without -r option.
The -R and -r options are synonymous. Causes directories,
buckets, and bucket subdirectories to be copied recursively.
If you neglect to use this option for an upload, gsutil will
copy any files it finds and skip any directories. Similarly,
neglecting to specify this option for a download will cause
gsutil to copy any objects at the current bucket directory
level, and skip any subdirectories.
If I understand well, you want to copy all the files from one bucket to another bucket, but you don't want to have the same hierarchy, instead, you want to have all the files in the root path.
Nowadays there’s no possible way to do that with gsutil, but you can do it with a script, here you have my solution:
from google.cloud import storage
bucketOrigin = storage.Client().get_bucket("<BUCKET_ID_ORIGIN>")
bucketDestination = storage.Client().get_bucket("<BUCKET_ID_DESTINATION")
for blob in bucketOrigin.list_blobs():
strfile=blob.download_as_string()
blobDest = bucketDestination.blob(blob.name[blob.name.rfind("/")+1:])
blobDest.upload_from_string(strfile)
As mentioned by Akash Dathan, you can use the Cloud Storage Transfer Service to move your bucket content. I recommend you to take a look on this Moving and Renaming Buckets guide, where you can find the steps required to perform this task.
Bear in mind the following requirments:
Transfer Service service account must have permission to read from
your source and write to your destination.
If you're deleting the source files, the Transfer Service's service account will need delete access to the source.
If your service account doesn't have these
permissions yet, a bucket owner must grant them.
Note. If you have 'storage.buckets.setIamPolicy' permission for the source and destination buckets, creating a transfer job will grant that service account the required source and destination permissions to complete the transfer.
You can list all files from your subfolders and get the file name by using split() method. Then you can use use a copy() method to copy the file to another bucket. The method below remove all subfolders:
const [files] = await storage.bucket(srcBucketName).getFiles();
files.forEach((file) => {
let fileName = file.name.split("/").pop();
if (fileName)
file.copy(storage.bucket(destBucketName).file(`${prefix}/${fileName}`));
});

Replicate local directory in S3 bucket

I have to replicate my local folder structure in S3 bucket, I am able to do so but its not creating folders which are empty. My local folder structure is as follows and command used is.
"aws-exec s3 sync ./inbound s3://msit.xxwmm.supplychain.relex.eeeeeeeeee/
its only creating inbound/procurement/pending/test.txt, masterdata and transaction is not cretated but if i put some file in each directory it will create.
As answered by #SabeenMalik in this StackOverflow thread:
S3 doesn't have the concept of directories, the whole folder/file.jpg
is the file name. If using a GUI tool or something you delete the
file.jpg from inside the folder, you will most probably see that the
folder is gone too. The visual representation in terms of directories
is for user convenience.
You do not need to pre-create the directory structure. Just pretend that the structure is there and everything will be okay.
Amazon S3 will automatically create the structure as objects are written to paths. For example, creating an object called s3://bucketname/inbound/procurement/foo` will automatically create the directories.
(This isn't strictly true because Amazon S3 doesn't use directories, but it will appear that the directories are there.)