Running a dataflow batch using flexRSGoal - google-cloud-platform

I found this article about running a dataflow batch on preemptive machines.
I tried to use this feature using this script:
gcloud beta dataflow jobs run $JOB_NAME \
--gcs-location gs://.../Datastore_to_Datastore_Delete \
--flexRSGoal=COST_OPTIMIZED \
--region ...1 \
--staging-location gs://.../temp \
--network XXX \
--subnetwork regions/...1/subnetworks/... \
--max-workers 1 \
--parameters \
datastoreReadGqlQuery="$QUERY",\
datastoreReadProjectId=$PROJECTID,\
datastoreDeleteProjectId=$PROJECTID
But this is the result:
ERROR: (gcloud.beta.dataflow.jobs.run) unrecognized arguments:
--flexRSGoal=COST_OPTIMIZED
To search the help text of gcloud commands, run: gcloud help --
SEARCH_TERMS
I run the command gcloud beta dataflow jobs run help and seems like this option flexRSGoal is not there...
# gcloud version
Google Cloud SDK 319.0.0
alpha 2020.11.13
beta 2020.11.13
bq 2.0.62
core 2020.11.13
gsutil 4.55
kubectl 1.16.13
What I'm missing?

Have you followed this? It seems that the correct command should be:
--flexrs_goal=COST_OPTIMIZED

It seems the --flexrs_goal flag [1] is not intended for the gcloud beta dataflow jobs run command tool, but for java/python command tools. For example the python3 -m ... command as the ones in [2] (Complete lecture of this doc recommended).
So instead of using:
gcloud beta dataflow jobs run <job_name>
--flexRSGoal=COST_OPTIMIZE ...
Run:
python3 <my-pipeline-script.py> \
--flexrs_goal=COST_OPTIMIZED ...
If you prefer to use java just switch the --flexRSGoal flag to --flexRSGoal and follow [3] instead [2].
[1] https://cloud.google.com/dataflow/docs/guides/flexrs#python
[2] https://cloud.google.com/dataflow/docs/quickstarts/quickstart-python#run-wordcount-on-the-dataflow-service
[3] https://cloud.google.com/dataflow/docs/quickstarts/quickstart-java-maven

Related

Terraform script to build and run Dataflow Flex template

Need to convert these 2 gcloud commands to build and run dataflow jobs using Terraform.
gcloud dataflow flex-template build ${TEMPLATE_PATH} \
--image-gcr-path "${TARGET_GCR_IMAGE}" \
--sdk-language "JAVA" \
--flex-template-base-image ${BASE_CONTAINER_IMAGE} \
--metadata-file "/Users/b.j/g/codebase/g-dataflow/pubsub-lite/src/main/resources/g_pubsublite_to_gcs_metadata.json" \
--jar "/Users/b.j/g/codebase/g-dataflow/pubsub-lite/target/debian/pubsub-lite-0.0.1-SNAPSHOT-uber.jar" \
--env FLEX_TEMPLATE_JAVA_MAIN_CLASS="com.in.g.gr.dataflow.PubSubLiteToGCS"
gcloud dataflow flex-template run "pub-sub-lite-flex-`date +%Y%m%d-%H%M%S`" \
--template-file-gcs-location=$TEMPLATE_FILE_LOCATION \
--parameters=subscription=$SUBSCRIPTION,output=$OUTPUT_DIR,windowSize=$WINDOW_SIZE_IN_SECS,partitionLevel=$PARTITION_LEVEL,numOfShards=$NUM_SHARDS \
--region=$REGION \
--worker-region=$WORKER_REGION \
--staging-location=$STAGING_LOCATION \
--subnetwork=$SUBNETWORK \
--network=$NETWORK
I've tried using the resource google_dataflow_flex_template_job from which i can run the dataflow job using the stored dataflow template(2nd gcloud command), now I need to create the template and docker image as per my 1st gcloud command using terraform ?
Any inputs on this ?? And whats the best way to pass the jars used in the 1st gcloud command (placing it in GCS bucket) ?
And whats the best way to pass the jars used in the 1st gcloud command (placing it in GCS bucket)?
There is no need to manually store these jar files in GCS. The gcloud dataflow flex-template build command will build a docker container image including all the required jar files and upload the image to the container registry. This image (+ the metadata file) is the only thing needed to run the template.
now I need to create the template and docker image as per my 1st gcloud command using terraform ?
AFAIK there is no special terraform module to build a flex template. I'd try using the terraform-google-gcloud module, which can execute an arbitrary gcloud command, to run gcloud dataflow flex-template build.
If you build your project using Maven, another option is using jib-maven-plugin to build and upload the container image instead of using gcloud dataflow flex-template build. See these build instructions for an example. You'll still need to upload the json image spec ("Creating Image Spec" section in the instructions) somehow, e.g. using the gsutil command or maybe using terraform's google_storage_bucket_object, so I think this approach is more complicated.

using gcloud beta builds triggers create cloud-source-repositories doesn't working with --dockerfile-image

I'm working on a auto devops workflow only based on the dockerfile using Cloud Build on GCP, when I try to use the following command it seems is not using the flag: --dockerfile-image
gcloud beta builds triggers create cloud-source-repositories \
--name="test-trigger-2" \
--repo="projects/nodrize-dev/repos/b722166a-56e0-46af-bd0d-42af8d37c570/bf11672f-34d5-4d8c-80cb-31120f39251a/quirino-backend" \
--branch-pattern="^master$" \
--dockerfile="Dockerfile" \
--dockerfile-dir="" \
--dockerfile-image="gcr.io/nodrize-dev/test-backend"
Created [https://cloudbuild.googleapis.com/v1/projects/nodrize-dev/triggers/896f8ac8-397c-464a-84f7-43e69f1bc6cb].
NAME CREATE_TIME STATUS
test-trigger-2 2021-06-02T21:06:54+00:00
I want to create trigger to run it later but the last flag isnt working I asume is using the default or fallback, because as you can see in the image name is:
gcr.io/nodrize-dev/b722166a-56e0-46af-bd0d-42af8d37c570/bf11672f-34d5-4d8c-80cb-31120f39251a/quirino-backend:$COMMIT_SHA:
dockerimage-name in gcp concole:
I hope someone can help me or at least know what is happening.
This works for me.
I suspect perhaps that the trigger is incorrect or is not being triggered and|or the image is not what was generated by the trigger.
PROJECT=...
REPO=...
gcloud source repos create ${REPO} \
--project=${PROJECT}
gcloud beta builds triggers create cloud-source-repositories \
--name="trigger" \
--project=${PROJECT} \
--repo=${REPO} \
--branch-pattern="^master$" \
--dockerfile="Dockerfile" \
--dockerfile-dir="." \
--dockerfile-image="gcr.io/${PROJECT}/freddie-01"
NAME CREATE_TIME STATUS
trigger 2021-06-03T15:24:27+00:00
git push google master
gcloud builds list \
--project=${PROJECT} \
--format="value(images)"
gcr.io/${PROJECT}/freddie-01:7dcf74e126af711d24bb2b652d86f0d28bbe3bd9
gcloud container images list \
--project=${PROJECT}
NAME
gcr.io/${PROJECT}/freddie-01

Invoke different entrypoints/modules when training with custom container

I've built a custom Docker container with my training application. The Dockerfile, at the moment, is something like
FROM python:slim
COPY ./src /pipelines/component/src
RUN pip3 install -U ...
...
ENTRYPOINT ["python3", "/pipelines/component/src/training.py"]
so when I run
gcloud ai-platform jobs submit training JOB_NAME \
--region=$REGION \
--master-image-uri=$IMAGE_URI
it goes as expected.
What I'd like to do is to add another module, like /pipelines/component/src/tuning.py; remove the default ENTRYPOINT from Dockerfile; decide which module to call from the gcloud command. So I tried
gcloud ai-platform jobs submit training JOB_NAME \
--region=$REGION \
--master-image-uri=$IMAGE_URI \
--module-name=src.tuning \
--package-path=/pipelines/component/src
It returns Source directory [/pipelines/component] is not a valid directory., because it's searching for the package path on the local machine, instead of the container. How can I solve this problem?
You can use TrainingInput.ReplicaConfig.ContainerCommand field to override the docker image's entrypoint. Here is a sample command:
gcloud ai-platform jobs submit training JOB_NAME \
--region=$REGION
--master-image-uri=$IMAGE_URI
--config=config.yaml
And config.yaml content will be something like this:
trainingInput:
scaleTier: BASIC
masterConfig:
containerCommand: ["python3", "/pipelines/component/src/tuning.py"]
This link has more context about config flag.
Similarly, you can override docker image's command with containerArgs field.

GCP cloud ml-engine start job issue

I am running the below code in GCP default VM and config.yaml also created with necessary fields. Although I am getting source directory is not a valid directory error.
gcloud ml-engine jobs submit training my_job \ --module-name
trainer.task \ --staging-bucket gs://my-bucket \ --package-path
/my/code/path/trainer \ --packages additional-dep1.tar.gz,dep2.whl
Have checked all the paths and they are ok and the data is within them however the command is not executing...
Help on above topic is much appreciated
Your Python package structure is probably missing an init.py. see http://python-packaging.readthedocs.io/en/latest/minimal.html
And verify it using setup.py sdist

ERROR: gcloud crashed (ArgumentError): argument USER_ARGS: unrecognized args: --runtime_version=1.0

Below script was running fine until yesterday morning.
gcloud ml-engine jobs submit training "$JOB_ID" \
--module-name trainer.task \
--package-path trainer \
--staging-bucket "$BUCKET" \
--region us-central1 \
--runtime_version=1.0 \
-- \
--output_path "${GCS_PATH}/training" \
--eval_data_paths "${GCS_PATH}/preproc/eval*" \
--train_data_paths "${GCS_PATH}/preproc/train*" \
--classification_type "multilabel" \
Running into below error:
ERROR: gcloud crashed (ArgumentError): argument USER_ARGS: unrecognized args: --runtime_version=1.0
The '--' argument must be specified between gcloud specific args on the left and USER_ARGS on the right.
Below are the gcloud components version:
$ gcloud version
Google Cloud SDK 147.0.0
alpha 2016.01.12
app-engine-go
app-engine-go-linux-x86_64 1.9.50
app-engine-java 1.9.50
app-engine-php " "
app-engine-python 1.9.50
beta 2016.01.12
bq 2.0.24
bq-nix 2.0.24
cloud-datastore-emulator 1.2.1
core 2017.03.13
alpha 2016.01.12
core-nix 2016.11.07
datalab 20170309
datalab-nix 20170105
gcd-emulator v1beta3-1.0.0
gcloud
gcloud-deps 2017.03.13
gcloud-deps-linux-x86_64 2017.02.21
gsutil 4.22
gsutil-nix 4.18
kubectl
kubectl-linux-x86_64 1.5.3
pubsub-emulator 2017.02.07
Not sure whether this is anything changed in Cloud, or I need check any config on my end that may cause this error.
You might need to use --runtime-version as the name of the argument (hyphen instead of underscore).
Without that, gcloud is assuming its some custom user-defined argument, which it expects to be in the list after the '--', hence the confusing error message.