I am very new to google cloud and following an example in "Google BigQuery: The Definitive Guide" to learn to use this platform. I am trying to make a dataset to hold the data of a dataset and exactly typing what it is written in the book but I get the following results: Invalid identifier 'ch04' for mk.
here is a picture of the command lines
Click here
Can you help me with this problem?
Thanks
I opened Cloud Shell from Google Cloud Plataform, at the right from the TERMINAL tab there's a tab with your Project Name that open a Terminal too. I typed the commands there and worked fine !
To create a dataset from the command line you need to run a bq mk command.
Look at this official documentation as it should provide all the necessary details on how to create a dataset. To sum up you need to follow this syntax
bq --location=<location> mk \
--dataset \
--default_table_expiration <integer1> \
--default_partition_expiration <integer2> \
--description <description> \
project_id:<dataset-name>
For example:
bq --location=US mk -d \
--default_table_expiration 3600 \
--description "This is my first dataset." \
myfirstdataset
I assume that ch04 may stand for information in chapter 4 or some other in-book notation that needs to be changed accordingly to make the command work.
In the cloud shell, See if your project is set using below command.
gcloud config list project
If it is not set, set the project id using below command
gcloud config set project <projectid>
then run the command
bq --location=US mk ch04
Related
Need to convert these 2 gcloud commands to build and run dataflow jobs using Terraform.
gcloud dataflow flex-template build ${TEMPLATE_PATH} \
--image-gcr-path "${TARGET_GCR_IMAGE}" \
--sdk-language "JAVA" \
--flex-template-base-image ${BASE_CONTAINER_IMAGE} \
--metadata-file "/Users/b.j/g/codebase/g-dataflow/pubsub-lite/src/main/resources/g_pubsublite_to_gcs_metadata.json" \
--jar "/Users/b.j/g/codebase/g-dataflow/pubsub-lite/target/debian/pubsub-lite-0.0.1-SNAPSHOT-uber.jar" \
--env FLEX_TEMPLATE_JAVA_MAIN_CLASS="com.in.g.gr.dataflow.PubSubLiteToGCS"
gcloud dataflow flex-template run "pub-sub-lite-flex-`date +%Y%m%d-%H%M%S`" \
--template-file-gcs-location=$TEMPLATE_FILE_LOCATION \
--parameters=subscription=$SUBSCRIPTION,output=$OUTPUT_DIR,windowSize=$WINDOW_SIZE_IN_SECS,partitionLevel=$PARTITION_LEVEL,numOfShards=$NUM_SHARDS \
--region=$REGION \
--worker-region=$WORKER_REGION \
--staging-location=$STAGING_LOCATION \
--subnetwork=$SUBNETWORK \
--network=$NETWORK
I've tried using the resource google_dataflow_flex_template_job from which i can run the dataflow job using the stored dataflow template(2nd gcloud command), now I need to create the template and docker image as per my 1st gcloud command using terraform ?
Any inputs on this ?? And whats the best way to pass the jars used in the 1st gcloud command (placing it in GCS bucket) ?
And whats the best way to pass the jars used in the 1st gcloud command (placing it in GCS bucket)?
There is no need to manually store these jar files in GCS. The gcloud dataflow flex-template build command will build a docker container image including all the required jar files and upload the image to the container registry. This image (+ the metadata file) is the only thing needed to run the template.
now I need to create the template and docker image as per my 1st gcloud command using terraform ?
AFAIK there is no special terraform module to build a flex template. I'd try using the terraform-google-gcloud module, which can execute an arbitrary gcloud command, to run gcloud dataflow flex-template build.
If you build your project using Maven, another option is using jib-maven-plugin to build and upload the container image instead of using gcloud dataflow flex-template build. See these build instructions for an example. You'll still need to upload the json image spec ("Creating Image Spec" section in the instructions) somehow, e.g. using the gsutil command or maybe using terraform's google_storage_bucket_object, so I think this approach is more complicated.
I’m trying to create a docker container that will execute a BigQuery query. I started with the Google provided image that had gcloud already and I add my bash script that has my query. I'm passing my service account key as an environment file.
Dockerfile
FROM gcr.io/google.com/cloudsdktool/cloud-sdk:latest
COPY main.sh main.sh
main.sh
gcloud auth activate-service-account X#Y.iam.gserviceaccount.com --key-file=/etc/secrets/service_account_key.json
bq query --use_legacy_sql=false
The gcloud command successfully authenticates but can't save to /.config/gcloud saying it is read-only. I've tried modifying that folders permissions during build and struggling to get it right.
Is this the right approach or is there a better way? If this is the right approach, how can I get ensure gcloud can write to the necessary folder?
See the example at the bottom of the Usage section.
You ought to be able to combine this into a single docker run command:
KEY="service_account_key.json"
echo "
[auth]
credential_file_override = /certs/${KEY}
" > ${PWD}/config
docker run \
--detach \
-env=CLOUDSDK_CONFIG=/config \
--volume=${PWD}/config:/config \
--volume=/etc/secrets/${KEY}:/certs/${KEY} \
gcr.io/google.com/cloudsdktool/cloud-sdk:latest \
bq query \
--use_legacy_sql=false
Where:
--env set the container's value for CLOUDSDK_CONFIG which depends on the first --volume flag which maps the host's config that we created in ${PWD} to the container's /config.
The second --volume flag maps the host's /etc/secrets/${KEY} (per your question) to the container's /certs/${KEY}. Change as you wish.
Suitably configured (🤞), you can run bq
I've not tried this but that should work :-)
I'm creating docker-machines in Google Cloud with the shell command
docker-machine create --driver google \
--google-project my-project \
--google-zone my-zone \
--google-machine-image debian-cloud/global/images/debian-10-buster-v20191210 \
machine-name
As you see, i use the image debian-10-buster-v20191210. But I want to switch the version of the image to a less recent one. And the problem is that I can't find the place where the list of
such images (debian-10-buster-v*) can be found. Can you please help me to find the place?
Determining a list of available images can be done using the gcloud command line.
--show deprecated indicates you want to see ALL images, not just the latest
--filter= only selects images with a starting name of debian-10-buster
$ gcloud compute images list --filter="name=debian-10-buster" --show-deprecated
NAME PROJECT FAMILY DEPRECATED STATUS
debian-10-buster-v20191115 debian-cloud debian-10 DEPRECATED READY
debian-10-buster-v20191121 debian-cloud debian-10 DEPRECATED READY
debian-10-buster-v20191210 debian-cloud debian-10 READY
You can find additional information in the gcloud Images List documentation.
I am running the below code in GCP default VM and config.yaml also created with necessary fields. Although I am getting source directory is not a valid directory error.
gcloud ml-engine jobs submit training my_job \ --module-name
trainer.task \ --staging-bucket gs://my-bucket \ --package-path
/my/code/path/trainer \ --packages additional-dep1.tar.gz,dep2.whl
Have checked all the paths and they are ok and the data is within them however the command is not executing...
Help on above topic is much appreciated
Your Python package structure is probably missing an init.py. see http://python-packaging.readthedocs.io/en/latest/minimal.html
And verify it using setup.py sdist
I am deploying a model to Google Cloud ML for the first time. I have trained and tested the model locally and it still needs work but it works ok.
I have uploaded it to Cloud ML and tested with the same example images I test locally that I know get detections. (using this tutorial)
When I do this, I get no detections. At first I thought I had uploaded the wrong checkpoint but I tested and the same checkpoint works with these images offline, I don't know how to debug further.
When I look at the results the file
prediction.results-00000-of-00001
is just empty
and the file
prediction.errors_stats-00000-of-00001
contains the following text: ('No JSON object could be decoded', 1)
Is this a sign the detection has run and detected nothing, or is there some problem while running?
Maybe the problem is I am preparing the images wrong for uploading?
The logs show no errors at all
Thank you
EDIT:
I was doing more tests and tried to run the model locally using the command "gcloud ml-engine local predict" instead of the usual local code. I get the same result as online, no answer at all, but also no error message
EDIT 2:
I am using a TF_Record file, so I don't understand the JSON response. Here is a copy of my command:
gcloud ml-engine jobs submit prediction ${JOB_ID} --data-
format=tf_record \ --input-paths=gs://MY_BUCKET/data_dir/inputs.tfr
\ --output-path=gs://MY_BUCKET/data_dir/version4 \ --region
us-central1 \ --model="gcp_detector" \ --version="Version4"
Works with the following commands
Model export:
# From tensorflow/models
export PYTHONPATH=$PYTHONPATH:/home/[user]/repos/DeepLearning/tools/models/research:/home/[user]/repos/DeepLearning/tools/models/research/slim
cd /home/[user]/repos/DeepLearning/tools/models/research
python object_detection/export_inference_graph.py \
--input_type encoded_image_string_tensor \
--pipeline_config_path /home/[user]/[path]/ssd_mobilenet_v1_pets.config \
--trained_checkpoint_prefix /[path_to_checkpoint]/model.ckpt-216593 \
--output_directory /[output_path]/output_inference_graph.pb
Cloud execution
gcloud ml-engine jobs submit prediction ${JOB_ID} --data-format=TF_RECORD \
--input-paths=gs://my_inference/data_dir/inputs/* \
--output-path=${YOUR_OUTPUT_DIR} \
--region us-central1 \
--model="model_name" \
--version="version_name"
I don't know what change exactly fixes the issue, but there are some small changes like tf_record now being TF_RECORD. Hope this helps someone else. Props to google support for their help (they suggested the changes)