Looking for an easy way to run gsutil command line arguments in Google Cloud Workflows.
The specific problem I'm trying to solve is dynamically remove all objects in a Cloud Storage Bucket so that the bucket can then be deleted. Trying to do this as part of a Workflows project. gsutil can remove files with this command gsutil rm -a gs://bucket/** and if I could run that in a Workflows step that would be great.
Cloud Workflow call only API. So, you have to call API, as gsutil do. I wrote an article to list all the file and to call a "compose" operation on the file. You can customize the code to call the delete API
If you really want to use GSUTIL, you can use Cloud Run Job for instance. And execute it with an API call with Workflows
Related
I have a Apache Airflow DAG running on an on-prem server. In the DAG, I want to call Google Cloud CLI command gsutil to copy data file into GCP Storage bucket. In order to do that, I have to call gcloud auth activate-service-account first, then gscutil cp. Is it possible that I can merge two commands into just one? Or is it possible for me to setup the default authentication for my GCP service account, so I can skip the first command? Thanks in advance for any help!
Instead of using shell commands to call GCP operations, firstly set a GCP connection in Airflow, then you should consider to use Airflow GCP Operators such as LocalFilesystemToGCSOperator, GCSToLocalFilesystemOperator or Google API within PythonOperator.
Thus, you won't need to run extra command to authenticate. The gcp_conn_id that you prepared and specified
will already handle this step for you.
It is always better to use official providers' operators/sensors/hooks instead grueling bash commands. You can discover more here.
For instance, I want to run the following gcloud CLI command,
gcloud run services delete [SERVICE]
But, from a triggered Google Cloud Function.
I've looked in a few places and have found a few similar things,
https://www.googlecloudcommunity.com/gc/Infrastructure-Compute-Storage/Automatic-Resource-Deletion/m-p/172865
https://github.com/PolideaInternal/cats-love-money
Create a Google function from a Google cloud function
But, I find them a bit tricky to follow and/or replicate.
The Google Cloud CLI is a Python program. That means a lot of dependencies and a requirement for a shell and OS environment. Cloud Functions does not provide either.
A better option for running the CLI is Cloud Run. This provides the additional benefit of being able to test the Cloud Run container locally. You will need to wrap the CLI with an HTTP server responding to HTTP requests which then execute the CLI.
However, most CLI commands can be easily duplicated with the Python SDKs and/or direct REST API calls which are supported by Cloud Functions. This will require a solid understanding of the services, APIs, and language.
I have relatively simple task to do but struggle with best AWS service mix to accomplish that:
I have simple java program (provided by 3rd party- I can't modify that, just use) that I can run anywhere with java -jar --target-location "path on local disc". The program, once executed, is creating csv file on local disc in path defied in --target-location
Once file is created I need to upload it to S3
The way I'm doing it currently is by having dedicated EC2 instance with java installed and first point is covered by java -jar ... and second with aws s3 cp ... command
I'm looking for better way of doing that (preferably serverless). I'm wandering if above points can be accomplished with AWS Glue Job type Python Shell? Second point (copy local file to S3), most likely I can cover with boto3 but first (java -jar execution)- I'm not sure.
Am I force to use EC2 instance or you see smarter way with AWS Glue?
Or most effective would be to build docker image (that contains this two instructions), register in ECR and run wit AWS Batch?
I'm looking for better way of doing that (preferably serverless).
I cannot tell if a serverless option is better, however, an EC2 instance will do the job just fine. Assume that you have CentOS on your instance, you may do it through
aaPanel GUI
Some useful web panels offer cron scheduled tasks, such as backing up some files from one directory to another S3 directory. I will use aaPanel as an example.
Install aaPanel
Install AWS S3 plugin
Configure the credentials in the plugin.
Cron
Add a scheduled task to back up files from "path on local disc" to AWS S3.
Rclone
A web panel goes beyond the scope of this question. Rclone is another useful tool I use to back up files from local disk to OneDrive, S3, etc.
Installation
curl https://rclone.org/install.sh | sudo bash
Sync
Sync a directory to the remote bucket, deleting any excess files in the bucket.
rclone sync -i /home/local/directory remote:bucket
We have terabytes of Data in Google Cloud Object Storage, we want to migrate it to AWS S3. What are the best ways to do it? Is there any 3rd party tool that can be better instead of going for direct transfer?
There could be multiple options available even without using any device (cloud to cloud migration) in less time.
** gsutil to copy data from a Google Cloud Storage bucket to an Amazon bucket, using a command such as:**
gsutil -m rsync -r gs://your-gcp-bucket s3://your-aws-s3-bucket
More details is available # https://cloud.google.com/storage/docs/gsutil/commands/rsync
Note: if you face speed challenge with default cloud shell, then you can create a big machine and execute above command from there.
I have a script that does some stuff with gcloud utils like gsutil and bq, eg:
#!/usr/bin/env bash
bq 'SELECT * FROM myproject.whatever WHERE date > $x' > res.csv
gsutil cp res.csv gs://my_storage/foo.csv
This works on my machine or VM, but I can't guarantee it will always be on, so I'd like to add this as a GCP cronjob/Lambda type of thing. From the docs here, it looks like the Cloud Scheduler can only do HTTP requests, Pub/Sub, or App Engine HTTP, none of which are exactly what I want.
So: is there any way in GCP to automate some gsutil / bq commands, like a cronjob, but without my supplying an always-on machine?
There are likely going to be multiple answers and this is but one.
For me, I would examine the concept of Google Cloud Run. The idea here is that you get to create a Docker image that is then instantiated, run and cleaned up when called by a REST request. What you put in your docker image is 100% up to you. It could be a simple image with tools like gcloud and gsutil installed with a script to run them with any desired parameters. Your contract with Cloud Run is only that you consume the incoming HTTP request.
When there are no requests to Cloud Run, there is no charge as nothing is running. You are only billed for the duration that your logic actually executes for.
I recommend Cloud Run over Cloud Functions as Cloud Run allows you to define the environment in which the commands run ... for example ... availability of gsutil.