Terraform script to build and run Dataflow Flex template - google-cloud-platform

Need to convert these 2 gcloud commands to build and run dataflow jobs using Terraform.
gcloud dataflow flex-template build ${TEMPLATE_PATH} \
--image-gcr-path "${TARGET_GCR_IMAGE}" \
--sdk-language "JAVA" \
--flex-template-base-image ${BASE_CONTAINER_IMAGE} \
--metadata-file "/Users/b.j/g/codebase/g-dataflow/pubsub-lite/src/main/resources/g_pubsublite_to_gcs_metadata.json" \
--jar "/Users/b.j/g/codebase/g-dataflow/pubsub-lite/target/debian/pubsub-lite-0.0.1-SNAPSHOT-uber.jar" \
--env FLEX_TEMPLATE_JAVA_MAIN_CLASS="com.in.g.gr.dataflow.PubSubLiteToGCS"
gcloud dataflow flex-template run "pub-sub-lite-flex-`date +%Y%m%d-%H%M%S`" \
--template-file-gcs-location=$TEMPLATE_FILE_LOCATION \
--parameters=subscription=$SUBSCRIPTION,output=$OUTPUT_DIR,windowSize=$WINDOW_SIZE_IN_SECS,partitionLevel=$PARTITION_LEVEL,numOfShards=$NUM_SHARDS \
--region=$REGION \
--worker-region=$WORKER_REGION \
--staging-location=$STAGING_LOCATION \
--subnetwork=$SUBNETWORK \
--network=$NETWORK
I've tried using the resource google_dataflow_flex_template_job from which i can run the dataflow job using the stored dataflow template(2nd gcloud command), now I need to create the template and docker image as per my 1st gcloud command using terraform ?
Any inputs on this ?? And whats the best way to pass the jars used in the 1st gcloud command (placing it in GCS bucket) ?

And whats the best way to pass the jars used in the 1st gcloud command (placing it in GCS bucket)?
There is no need to manually store these jar files in GCS. The gcloud dataflow flex-template build command will build a docker container image including all the required jar files and upload the image to the container registry. This image (+ the metadata file) is the only thing needed to run the template.
now I need to create the template and docker image as per my 1st gcloud command using terraform ?
AFAIK there is no special terraform module to build a flex template. I'd try using the terraform-google-gcloud module, which can execute an arbitrary gcloud command, to run gcloud dataflow flex-template build.
If you build your project using Maven, another option is using jib-maven-plugin to build and upload the container image instead of using gcloud dataflow flex-template build. See these build instructions for an example. You'll still need to upload the json image spec ("Creating Image Spec" section in the instructions) somehow, e.g. using the gsutil command or maybe using terraform's google_storage_bucket_object, so I think this approach is more complicated.

Related

How to authenticate a gcloud service account from within a docker container

I’m trying to create a docker container that will execute a BigQuery query. I started with the Google provided image that had gcloud already and I add my bash script that has my query. I'm passing my service account key as an environment file.
Dockerfile
FROM gcr.io/google.com/cloudsdktool/cloud-sdk:latest
COPY main.sh main.sh
main.sh
gcloud auth activate-service-account X#Y.iam.gserviceaccount.com --key-file=/etc/secrets/service_account_key.json
bq query --use_legacy_sql=false
The gcloud command successfully authenticates but can't save to /.config/gcloud saying it is read-only. I've tried modifying that folders permissions during build and struggling to get it right.
Is this the right approach or is there a better way? If this is the right approach, how can I get ensure gcloud can write to the necessary folder?
See the example at the bottom of the Usage section.
You ought to be able to combine this into a single docker run command:
KEY="service_account_key.json"
echo "
[auth]
credential_file_override = /certs/${KEY}
" > ${PWD}/config
docker run \
--detach \
-env=CLOUDSDK_CONFIG=/config \
--volume=${PWD}/config:/config \
--volume=/etc/secrets/${KEY}:/certs/${KEY} \
gcr.io/google.com/cloudsdktool/cloud-sdk:latest \
bq query \
--use_legacy_sql=false
Where:
--env set the container's value for CLOUDSDK_CONFIG which depends on the first --volume flag which maps the host's config that we created in ${PWD} to the container's /config.
The second --volume flag maps the host's /etc/secrets/${KEY} (per your question) to the container's /certs/${KEY}. Change as you wish.
Suitably configured (🤞), you can run bq
I've not tried this but that should work :-)

How can we get GCP Project ID on AI Platform Training?

I want to get my GCP Project ID on AI Platform.
I tried to
use metadata server
run gcloud config get-value project
but, AI Platform instance seems to work outside my GCP Project.
One thing you can do is to pass --project $PROJECT_ID as an application parameter when you launch the job (docs). As an example based on this sample:
gcloud ai-platform jobs submit training ${JOB_NAME} \
--stream-logs \
# more job configuration parameters...
--config=./config.yaml \
-- \
--project=${PROJECT_ID} \
# more application parameters...
--num-layers=3
Then, in task.py (or file defined in --module-name) you can add:
args_parser.add_argument(
'--project',
help='Service Project ID where ML jobs are launched.',
required=True)
and then simply access it with args.project:
logging.info('Project ID: {}'.format(args.project))

Object detection training job fails on GCP

I am running a training job on GCP for object detection using my own dataset. My training job script is like this:
JOB_NAME=object_detection"_$(date +%m_%d_%Y_%H_%M_%S)"
echo $JOB_NAME
gcloud ml-engine jobs submit training $JOB_NAME \
--job-dir=gs://$1 \
--scale-tier BASIC_GPU \
--runtime-version 1.12 \
--packages $PWD/models/research/dist/object_detection-0.1.tar.gz,$PWD/models/research/slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
--module-name $PWD/models/research/object_detection.model_main \
--region europe-west1 \
-- \
--model_dir=gs://$1 \
--pipeline_config_path=gs://$1/data/fast_rcnn_resnet101_coco.config
It fails at the following line :
python -m $PWD/models/research/object_detection.model_main --model_dir=gs://my-hand-detector --pipeline_config_path=gs://my-hand-detector/data/fast_rcnn_resnet101_coco.config --job-dir gs://my-hand-detector/
/usr/bin/python: Import by filename is not supported.
Based on logs, this is the source of error which I have understood. Any help in this regard would be helpful. Thank you.
I assume that you are using model_main.py file from Tensorflow GitHub repository. Using it, I have been able to replicate your error message. After troubleshooting, I successfully submitted the training job and could train the model properly.
In order to address your issue I suggest you to follow this tutorial, taking special consideration to the following steps:
Make sure to have an updated version of tensorflow (1.14 doesn’t include all necessary capabilities)
Properly generate TFRecords from input data and upload them to GCS bucket
Configure object detection pipeline (set the proper paths to data and label map)
In my case, I have reproduced the workflow using PASCAL VOC input data (See this).

Is it possible to deploy google dataflow python job template's metadata with the template (in one command)?

I have a dataflow jobs written in python in a main.py file, when I want to deploy it as a template I use the following command:
python main.py \
--runner=DataflowRunner' \
--template_location=gs://dataflow-bucket/templates/my_template \
--project=foo \
--staging_location=gs://dataflow-bucket/staging \
--temp_location=gs://dataflow-bucket/temp
I also have a metadata file named my_template_metadata that I currently have to separately upload using gsutil
gsutil cp my_template_metadata gs://dataflow-bucket/templates/my_template_metadata
Is there a way to deploy the metadata file in the same time the template is deployed ? it would be much more convenient (or is there a rational reason why it's not possible ?)

GCP cloud ml-engine start job issue

I am running the below code in GCP default VM and config.yaml also created with necessary fields. Although I am getting source directory is not a valid directory error.
gcloud ml-engine jobs submit training my_job \ --module-name
trainer.task \ --staging-bucket gs://my-bucket \ --package-path
/my/code/path/trainer \ --packages additional-dep1.tar.gz,dep2.whl
Have checked all the paths and they are ok and the data is within them however the command is not executing...
Help on above topic is much appreciated
Your Python package structure is probably missing an init.py. see http://python-packaging.readthedocs.io/en/latest/minimal.html
And verify it using setup.py sdist