I would like to automate exports of our Spanner database to Google Cloud Storage. Is this possible using the gcloud SDK? I could not find a command for this.
Is there any other recommended way to back up Spanner databases?
The export and Import pipelines are Dataflow templates that can be started using the Gcloud command.
See the third paragraph in:
https://cloud.google.com/spanner/docs/export
And how to run the template in:
https://cloud.google.com/dataflow/docs/guides/templates/provided-templates#cloud_spanner_to_gcs_avro
(Select the Gcloud tab in the executing the template section).
Yes, it is possible to do this using gcloud, but it is not a direct Cloud Spanner command. The detailed documentation is here.
Essentially you use gcloud to run a Cloud Dataflow job to export or backup your data to GCS using a command like the following:
gcloud dataflow jobs run [JOB_NAME] \
--gcs-location='gs://dataflow-templates/latest/Cloud_Spanner_to_GCS_Avro' \
--region=[DATAFLOW_REGION] \
--parameters='instanceId=[YOUR_INSTANCE_ID],databaseId=[YOUR_DATABASE_ID],outputDir=[YOUR_GCS_DIRECTORY]
Related
I'm new to GCP. going over different documents on gcp composer and cloud shell but not able to find a place where I can connect the cloud shell environment to the composer DAG folder.
Right now, I'm creating python script outside cloud shell (local system), uploading manually to DAG folder but i want to do this on the cloud shell only. can any one give me the directions on it?
Also when I tried to use import airflow in my python file on cloud shell it gives me error that module not found. how do I install that too?
Take alook on this GCP documentation:
Adding and Updating DAGs (workflows)
among many other entries, you will find information like this one:
Determining the storage bucket name
To determine the name of the storage bucket associated with your environment:
gcloud composer environments describe ENVIRONMENT_NAME \
--location LOCATION \
--format="get(config.dagGcsPrefix)"
where:
ENVIRONMENT_NAME is the name of the environment.
LOCATION is the Compute Engine region where the environment is located.
--format is an option to specify only the dagGcsPrefix property instead of all environment details.
The dagGcsPrefix property shows the bucket name:
gs://region-environment_name-random_id-bucket/
Adding or updating a DAG
To add or update a DAG, move the Python .py file for the DAG to the environment's dags folder in Cloud Storage.
gcloud composer environments storage dags import \
--environment ENVIRONMENT_NAME \
--location LOCATION \
--source LOCAL_FILE_TO_UPLOAD
where:
ENVIRONMENT_NAME is the name of the environment.
LOCATION is the Compute Engine region where the environment is located.
LOCAL_FILE_TO_UPLOAD is the DAG to upload.
I am very new to Google Cloud Platform, and I am doing a POC for moving a hive application (tables and jobs) to Google Dataproc. The data has already been moved to Google cloud Storage.
Is there an inbuilt way to create all the tables from hive in dataproc in bulk, instead of creating one by one using the hive prompt?
Dataproc support Hive job type, so you can use the gcloud command:
gcloud dataproc jobs submit hive --cluster=CLUSTER \
-e 'create table t1 (id int, name string); create table t2 ...;'
or
gcloud dataproc jobs submit hive --cluster=CLUSTER -f create_tables.hql
You can also SSH into the master node, then use beeline to execute the script:
beeline -u jdbc:hive2://localhost:10000 -f create_tables.hql
I have logging enabled on my 2nd Generation CloudSQL instances in GCP - however I'm attempting to read these using the CLI and drawing a blank.
If I run $ gcloud logging logs list I can see the logs I want to read, example as follows:
projects/<project name>/logs/cloudsql.googleapis.com%2Fmysql-slow.log
projects/<project name>/logs/cloudsql.googleapis.com%2Fmysql.err
The docs are confusing, but it looks like I should be able to read them if I run:
gcloud logging read "logName=projects/<project name>/logs/cloudsql.googleapis.com%2Fmysql.err" --limit 10 --format json
However this only returns a blank array as []
I just want to read out the logs.
What am I doing wrong?
You have to execute the gcloud logging read projects/<project name>/logs/cloudsql.googleapis.com%2Fmysql-slow.log and gcloud logging read projects/<project name>/logs/cloudsql.googleapis.com%2Fmysql.err. As it is specified in Quickstart using Cloud SDK documentation. Do not use the " in the command.
I'm new to google cloud and i try to experiment it.
I can see that preparing scripts is some kind of vital if i want to create and delete clusters every days.
For dataproc clusters, it's easy :
gcloud dataproc clusters create spark-6-m \
--async \
--project=my-project-id \
--region=us-east1 \
--zone=us-east1-b \
--bucket=my-project-bucket \
--image-version=1.2 \
--num-masters=1 \
--master-boot-disk-size=10GB \
--master-machine-type=n1-standard-1 \
--worker-boot-disk-size=10GB \
--worker-machine-type=n1-standard-1 \
--num-workers=6 \
--initialization-actions=gs://dataproc-initialization-actions/jupyter2/jupyter2.sh
Now, i'd like to create a cassandra cluster. I see that the code launcher allows to do that easily too but I can't find a gcloud command to automate it.
Is there a way to create cloud launcher products clusters via gcloud ?
Thanks
Cloud Launcher deployments can be replicated from the Cloud Shell using Custom Deployments [1].
Once the Cloud Launcher deployment (in this case a Cassandra cluster) is finished the details of the deployment can be seen in the Deployment Manager [2].
The deployment details have an Overview section with the configuration and the imported files used for the deployment process. Download the “Expanded Config” file, this will be the .yaml file for the custom deployment [3]. Download the imports files to the same directory as the .yaml file to be able to deploy correctly [4].
This files and configuration will create an equivalent deployment as the Cloud Launcher.
I would like to know if GCP's DataProc supports WebHCat. Googling hasn't turned up anything.
So, does GCP DataProc support/provide WebHCat and if so what is the URL endpoint?
Dataproc does not provide WebHCat out of the box, however, its trivial to create an initialization action such as:
#!/bin/bash
apt-get install hive-webhcat-server
WebHCat will be available on port 50111:
http://my-cluster-m:50111/templeton/v1/ddl/database/default/table/my-table
Alternatively, it is possible to setup a JDBC connection to HiveServer2 (available by default):
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC
As of now you can use Dataproc Hive WebHCat component to activate Hive WebHCat during cluster creation:
gcloud dataproc clusters create $CLUSTER_NAME --optional-components=HIVE_WEBHCAT