How can we get GCP Project ID on AI Platform Training? - google-cloud-platform

I want to get my GCP Project ID on AI Platform.
I tried to
use metadata server
run gcloud config get-value project
but, AI Platform instance seems to work outside my GCP Project.

One thing you can do is to pass --project $PROJECT_ID as an application parameter when you launch the job (docs). As an example based on this sample:
gcloud ai-platform jobs submit training ${JOB_NAME} \
--stream-logs \
# more job configuration parameters...
--config=./config.yaml \
-- \
--project=${PROJECT_ID} \
# more application parameters...
--num-layers=3
Then, in task.py (or file defined in --module-name) you can add:
args_parser.add_argument(
'--project',
help='Service Project ID where ML jobs are launched.',
required=True)
and then simply access it with args.project:
logging.info('Project ID: {}'.format(args.project))

Related

How to create connector in airflow that is of type external provider (like the google-cloud-plaform) with the airflow REST API

I'm trying to automate creation of connector in airflow by github action, but since it is an external provider, the payload that need to be sent to airflow REST API doesn't work and i didn't find any documentation on how to do it.
So here is the PAYLOAD i'm trying to send :
PAYLOAD = {
"connection_id": CONNECTOR,
"conn_type": "google_cloud_platform",
"extra": json.dumps({
"google_cloud_platform": {
"keyfile_dict" : open(CONNECTOR_SERVICE_ACCOUNT_FILE, "r").read(),
"num_retries" : 2,
}
})
}
According to the airflow documentation here
And the information i found on the "create connector" page of airflow UI :
Airflow UI create connector page
But i received no error (code 200) and the connector is created but doesn't have the settings i tried to configure.
I confirm the creation works on the UI.
Does anyone have a solution or document that refer to the exact right payload i need to sent to airflow rest api ? Or maybe i miss something.
Airflow version : 2.2.3+composer
Cloud Composer version (GCP) : 2.0.3
Github runner version : 2.288.1
Language : Python
Thanks guys and feel free to contact me for further questions.
Bye
#vdolez was write, it's kind of a pain to format the payload to have the exact same format airflow REST API want. it's something like this :
"{\"extra__google_cloud_platform__key_path\": \"\",
\"extra__google_cloud_platform__key_secret_name\": \"\",
\"extra__google_cloud_platform__keyfile_dict\": \"{}\",
\"extra__google_cloud_platform__num_retries\": 5,
\"extra__google_cloud_platform__project\": \"\",
\"extra__google_cloud_platform__scope\": \"\"}"
And when you need to nest dictionnary inside some of these field, not worth the time and effort. But in case someone want to know, you have to escape every special character.
I change my workflow to notify competent users to create connector manually after my pipeline succeed.
I will try to contact airflow/cloud composer support to see if we can have a feature for better formatting.
You might be running into encoding/decoding issues while sending data over the web.
Since you're using Composer, it might be a good idea to use Composer CLI to create a connection.
Here's how to run airflow commands in Composer:
gcloud composer environments run ENVIRONMENT_NAME \
--location LOCATION \
SUBCOMMAND \
-- SUBCOMMAND_ARGUMENTS
Here's how to create a connection with the native Airflow commands:
airflow connections add 'my_prod_db' \
--conn-type 'my-conn-type' \
--conn-login 'login' \
--conn-password 'password' \
--conn-host 'host' \
--conn-port 'port' \
--conn-schema 'schema' \
...
Combining the two, you'll get something like:
gcloud composer environments run ENVIRONMENT_NAME \
--location LOCATION \
connections \
-- add 'my_prod_db' \
--conn-type 'my-conn-type' \
--conn-login 'login' \
--conn-password 'password' \
--conn-host 'host' \
--conn-port 'port' \
--conn-schema 'schema' \
...
You could run this in a Docker image where gcloud is already installed.

Terraform script to build and run Dataflow Flex template

Need to convert these 2 gcloud commands to build and run dataflow jobs using Terraform.
gcloud dataflow flex-template build ${TEMPLATE_PATH} \
--image-gcr-path "${TARGET_GCR_IMAGE}" \
--sdk-language "JAVA" \
--flex-template-base-image ${BASE_CONTAINER_IMAGE} \
--metadata-file "/Users/b.j/g/codebase/g-dataflow/pubsub-lite/src/main/resources/g_pubsublite_to_gcs_metadata.json" \
--jar "/Users/b.j/g/codebase/g-dataflow/pubsub-lite/target/debian/pubsub-lite-0.0.1-SNAPSHOT-uber.jar" \
--env FLEX_TEMPLATE_JAVA_MAIN_CLASS="com.in.g.gr.dataflow.PubSubLiteToGCS"
gcloud dataflow flex-template run "pub-sub-lite-flex-`date +%Y%m%d-%H%M%S`" \
--template-file-gcs-location=$TEMPLATE_FILE_LOCATION \
--parameters=subscription=$SUBSCRIPTION,output=$OUTPUT_DIR,windowSize=$WINDOW_SIZE_IN_SECS,partitionLevel=$PARTITION_LEVEL,numOfShards=$NUM_SHARDS \
--region=$REGION \
--worker-region=$WORKER_REGION \
--staging-location=$STAGING_LOCATION \
--subnetwork=$SUBNETWORK \
--network=$NETWORK
I've tried using the resource google_dataflow_flex_template_job from which i can run the dataflow job using the stored dataflow template(2nd gcloud command), now I need to create the template and docker image as per my 1st gcloud command using terraform ?
Any inputs on this ?? And whats the best way to pass the jars used in the 1st gcloud command (placing it in GCS bucket) ?
And whats the best way to pass the jars used in the 1st gcloud command (placing it in GCS bucket)?
There is no need to manually store these jar files in GCS. The gcloud dataflow flex-template build command will build a docker container image including all the required jar files and upload the image to the container registry. This image (+ the metadata file) is the only thing needed to run the template.
now I need to create the template and docker image as per my 1st gcloud command using terraform ?
AFAIK there is no special terraform module to build a flex template. I'd try using the terraform-google-gcloud module, which can execute an arbitrary gcloud command, to run gcloud dataflow flex-template build.
If you build your project using Maven, another option is using jib-maven-plugin to build and upload the container image instead of using gcloud dataflow flex-template build. See these build instructions for an example. You'll still need to upload the json image spec ("Creating Image Spec" section in the instructions) somehow, e.g. using the gsutil command or maybe using terraform's google_storage_bucket_object, so I think this approach is more complicated.

Invoke different entrypoints/modules when training with custom container

I've built a custom Docker container with my training application. The Dockerfile, at the moment, is something like
FROM python:slim
COPY ./src /pipelines/component/src
RUN pip3 install -U ...
...
ENTRYPOINT ["python3", "/pipelines/component/src/training.py"]
so when I run
gcloud ai-platform jobs submit training JOB_NAME \
--region=$REGION \
--master-image-uri=$IMAGE_URI
it goes as expected.
What I'd like to do is to add another module, like /pipelines/component/src/tuning.py; remove the default ENTRYPOINT from Dockerfile; decide which module to call from the gcloud command. So I tried
gcloud ai-platform jobs submit training JOB_NAME \
--region=$REGION \
--master-image-uri=$IMAGE_URI \
--module-name=src.tuning \
--package-path=/pipelines/component/src
It returns Source directory [/pipelines/component] is not a valid directory., because it's searching for the package path on the local machine, instead of the container. How can I solve this problem?
You can use TrainingInput.ReplicaConfig.ContainerCommand field to override the docker image's entrypoint. Here is a sample command:
gcloud ai-platform jobs submit training JOB_NAME \
--region=$REGION
--master-image-uri=$IMAGE_URI
--config=config.yaml
And config.yaml content will be something like this:
trainingInput:
scaleTier: BASIC
masterConfig:
containerCommand: ["python3", "/pipelines/component/src/tuning.py"]
This link has more context about config flag.
Similarly, you can override docker image's command with containerArgs field.

Cloud Machine Learning Engine fails to deploy model

I have trained both my own model and the one from the official tutorial.
I'm up to the step to deploy the model to support prediction. However, it keeps giving me an error saying:
"create version failed. internal error happened"
when I attempt to deploy the models by running:
gcloud ml-engine versions create v1 \
--model $MODEL_NAME \
--origin $MODEL_BINARIES \
--python-version 3.5 \
--runtime-version 1.13
*the model binary should be correct, as I pointed it to the folder containing model.pb and variables folder, e.g. MODEL_BINARIES=gs://$BUCKET_NAME/results/20190404_020134/saved_model/1554343466.
I have also tried to change the region setting for the model as well, but this doesn't help.
Turns out your GCS bucket and the trained model needs to be in the same region. This was not explained well in the Cloud ML tutorial, where it only says:
Note: Use the same region where you plan on running Cloud ML Engine jobs. The example uses us-central1 because that is the region used in the getting-started instructions.
Also note that a lot of regions cannot be used for both the bucket and model training (e.g. asia-east1).

GCP cloud ml-engine start job issue

I am running the below code in GCP default VM and config.yaml also created with necessary fields. Although I am getting source directory is not a valid directory error.
gcloud ml-engine jobs submit training my_job \ --module-name
trainer.task \ --staging-bucket gs://my-bucket \ --package-path
/my/code/path/trainer \ --packages additional-dep1.tar.gz,dep2.whl
Have checked all the paths and they are ok and the data is within them however the command is not executing...
Help on above topic is much appreciated
Your Python package structure is probably missing an init.py. see http://python-packaging.readthedocs.io/en/latest/minimal.html
And verify it using setup.py sdist