GCP How to copy files automatically from Project A to Project B every monday? - google-cloud-platform

GCP is a new thing for me but i want to know if it's possible to copy a specific file (e.g : myFiles.csv) from a bucket in the project A to a bucket in the project B every monday at 6.00 am ?
I need it because myFiles.csv is overwritten every Monday and i need to share it with the project B.

You can use Storage transfer :
https://cloud.google.com/storage-transfer/docs/create-transfers#google-cloud-console
With this service, you can select the source and destination bucket and scheduling options, every day in your case.
Source bucket project GCP A :
In this example, I selected a folder team_league in a bucket called mazlum_dev
In the prefix field, I added the name of the file I want to transfer input_team_slogans.json
You have to put your file name for your job.
Destination bucket project B :
You have to select the output folder of your destination bucket.
Sheduling options :
You can also use the GCloud sdk if needed with gsutil :
gsutil cp gs://your_bucket_project_a/your_file gs://your_bucket_project_b/output/
But you have to find a way to cron this script every day, that's why I recommend the first solution because everything is native and integrated for your need.

Follow below steps :
Click on this Web console link Storage > Transfer to create a new transfer. Then Select the source bucket you want to copy from ex.Project A. So once you go to the destination part of the transfer form, you can write/paste the target bucket (Ex. Project B) right in its text input. Even if that bucket is from another project. It will show you a green icon once the target has been verified as being an existing bucket. You can continue the form again to finalize your setup.
Once you start the transfer from the form, you can follow its progress by hitting the refresh button on top of the console.
As the bucket identifiers are globally unique (this is key to the solution).
Refer this SO Link for more information.

Related

Google CLoud Transfer Job is creating one extra folder

I have created a Transfer Job to import some of my website's static resources to Google storage.
The job was supposed to import the data in a bucket named www.pretty-story.com.
It is importing from a tsv file located here.
For instance the first url is :
https://www.pretty-story.com/wp-includes/js/jquery/jquery.min.js
so I would have expected the job to create the folder structure starting with wp-includes.
But instead the job created this folder structure www.pretty-story.com\wp-includes\js\jquery.
Therefore the complete path (including my bucket name) is :
www.pretty-story.com\www.pretty-story.com\wp-includes\js\jquery.
How can I tell the data transfer job to use the bucket as first folder, instead of creating a subfolder with the same name ?
According to https://cloud.google.com/storage-transfer/docs/create-url-list:
When an object located at http(s)://[HOSTNAME]:[PORT]/[URL_PATH] is transferred to Cloud Storage, the name of the object in Cloud Storage is [HOSTNAME]/[URL_PATH].
You don't have an option to skip the [HOSTNAME]/ part of this, so what you are asking is not possible.
If the amount of data involved is reasonable, I recommend downloading it to a workstation and using gsutil to copy it into a bucket without the hostname prefix.

Connect BigQuery as a source to Data Fusion in another GCP project

I am trying to connect BigQuery of ProjectA to Data Fusion of ProjectB and its asking me to enter a service key file. I have tried to upload the service key file to Cloud Storage of ProjectB and provided the link but it's asking me to provide a local file path.
Can someone help me on this?
Thanks in advance.
Can you try this, grant BQ permission of project A to data fusion in project B.
service-project_number#gcp-sa-datafusion.iam.gserviceaccount.com.
project_number-compute#developer.gserviceaccount.com.
Steps:
Navigate to the customer project that contains the CDF instance and copy the project number (this is found on the Home Page in the Project Info card)
Navigate to the project that contains the resources you would like to interact with.
In the sidebar, click on ‘IAM & Admin’
Click on ‘Add’ at the top of the page.
Provide the first service account name from the table above, be sure to replace with the actual number you obtained in step 1
Grant the Admin role for the resource you would like to interact with. Ex. BigQuery Admin for reading/writing to BigQuery. For BigQuery, you will also need to grant the BigQuery Data Owner role as well.
Repeat steps 5 & 6 for the second service account in the table above.
In your pipeline, ensure you define the correct Project Id for the sources/sinks. Using ‘auto-detect’ will default to the customer project that contains the CDF instance.
Can you try download the service key json file to the local, ie you local computer? And try to put the file into some folder and provide the full path to that service key file in the BigQuery properties.

Copying objects from one bucket directory folder to another bucket folder using transfer

I'm wanting to use google transfer to copy all folders/files in a specific directory in Bucket-1 to the root directory of Bucket-2.
Have tried to use transfer with the filter option but doesn't copy anything across.
Any pointers on getting this to work within transfer or step by step for functions would be really appreciated.
I reproduced your issue and worked for me using gsutil.
For example:
gsutil cp -r gs://SourceBucketName/example.txt gs://DestinationBucketName
Furthermore, I tried to copy using Transfer option and it also worked. The steps I have done with Transfer option are these:
1 - Create new Transfer Job
Panel: “Select Source”:
2 - Select your source for example Google Cloud Storage bucket
3 - Select your bucket with the data which you want to copy.
4 - On the field “Transfer files with these prefixes” add your data (I used “example.txt”)
Panel “Select destination”:
5 - Select your destination Bucket
Panel “Configure transfer”:
6 - Run now if you want to complete the transfer now.
7 - Press “Create”.
For more information about copy from a bucket to another you can check the official documentation.
So, a few things to consider here:
You have to keep in mind that Google Cloud Storage buckets don’t treat subdirectories the way you would expect. To the bucket it is basically all part of the file name. You can find more information about that in the How Subdirectories Work documentation.
The previous is also the reason why you cannot transfer a file that is inside a “directory” and expect to see only the file’s name appear in the root of your targeted bucket. To give you an example:
If you have a file at gs://my-bucket/my-bucket-subdirectory/myfile.txt, once you transfer it to your second bucket it will still have the subdirectory in its name, so the result will be: gs://my-second-bucket/my-bucket-subdirectory/myfile.txt
This is why, If you are interested in automating this process, you should definitely give the Google Cloud Storage Client Libraries a try.
Additionally, you could also use the GCS Client with Google Cloud Functions. However, I would just suggest this if you really need the Event Triggers offered by GCF. If you just want the transfer to run regularly, for example on a cron job, you could still use the GCS Client somewhere other than a Cloud Function.
The Cloud Storage Tutorial might give you a good example of how to handle Storage events.
Also, on your future posts, try to provide as much relevant information as possible. For this post, as an example, it would’ve been nice to know what file structure you have on your buckets and what you have been getting as an output. And If you can provide straight away what’s your use case, it will also prevent other users from suggesting solutions that don’t apply to your needs.
try this in Cloud Shell in the project
gsutil cp -r gs://bucket1/foldername gs://bucket2

How to move S3 data to one partition level up?

Let's say I have a table in the path
s3:bucketname/tablename/month/day/deviceid
I want to move all files to s3:bucketname/tablename/month/day/. In other words, I want to ignore the last partition.
Note that there are already data in the path s3:bucketname/tablename/month/day/
Can I use aws s3 to achieve this?
I have more than 100K files so I cannot do this manually.
I believe you can use the AWS console on the web. Open the source folder (deviceid), select all its contents by the checkbox to the left on Name.
Then select Actions → Copy.
Open the destination folder (day), click Actions → Paste. Confirm the operation.
Once the operation has finished successfully, you can delete the source folder with all its contents.
You can do this in 2 ways.
If you are not familiar with the aws cli, then go to the console, select all the files, cut and paste on the desired directory. However, this is very time consuming and inefficient task.
I recommend the aws cli for S3. You can find some sample commands here.

Remove Incomplete Multipart Upload files from AWS S3

actually we have one application for file storage (Dropbox) which is using the AWS s3 bucket .
we have diffrect plans for end users like Free and silver/paied depanding on the size of file.
Some time users upload the file druing the upload process its intrept due to some reason like
1 - user cancel the uploading Process in middle
2 - Network glitch between user's internet and AWS S3
In above cases if for example user try to upload 1GB file and in the middle of upload process user/he/she cancel it, in this cases 50% (0.5GB) file was already uploaded to S3.
so that uploaded file is there on the s3 backet and it occoupied the space on s3 and also we have to pay for that 0.5GB file.
I want if upload process kill by end user or due to the network issue the uploaded part of file should be delete from s3 after some time or on the same time when user upload it and it was not completed/intercepted.
how can i define a life cycle for S3 bucket to accomplished my requirement.
You can create a new rule for incomplete multipart uploads using the Console:
1) Start by opening the console and navigating to the desired bucket
2) Then click on Properties, open up the Lifecycle section, and click on Add rule:
3) Decide on the target (the whole bucket or the prefixed subset of your choice) and then click on Configure Rule:
4) Then enable the new rule and select the desired expiration period:
5) As a best practice, we recommend that you enable this setting even if you are not sure that you are actually making use of multipart uploads. Some applications will default to the use of multipart uploads when uploading files above a particular, application-dependent, size.
Here’s how you set up a rule to remove delete markers for expired objects that have no previous versions:
You can refer this AWS Blog Post
Note: If you are on New Console Select Bucket --> Click Management
(4th Tab) --> Select Lifecycle Tab (1st) --> Click Add Lifecycle Rule
Butto
n.