I am new to google cloud and was told to use Variant Transforms in order to get .vcf files into Big Query. I did everything specified on the Variant Transforms read me and copy and pasted the first block of code in to a bash file:
#!/bin/bash
# Parameters to replace:
GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
INPUT_PATTERN=gs://BUCKET/*.vcf
OUTPUT_TABLE=GOOGLE_CLOUD_PROJECT:BIGQUERY_DATASET.BIGQUERY_TABLE
TEMP_LOCATION=gs://BUCKET/temp
COMMAND="/opt/gcp_variant_transforms/bin/vcf_to_bq \
--project ${GOOGLE_CLOUD_PROJECT} \
--input_pattern ${INPUT_PATTERN} \
--output_table ${OUTPUT_TABLE} \
--temp_location ${TEMP_LOCATION} \
--job_name vcf-to-bigquery \
--runner DataflowRunner"
gcloud alpha genomics pipelines run \
--project "${GOOGLE_CLOUD_PROJECT}" \
--logging "${TEMP_LOCATION}/runner_logs_$(date +%Y%m%d_%H%M%S).log" \
--zones us-west1-b \
--service-account-scopes https://www.googleapis.com/auth/cloud-platform \
--docker-image gcr.io/gcp-variant-transforms/gcp-variant-transforms \
--command-line "${COMMAND}"
I tried to run this, while replacing the parameters appropriately and got this error:
ERROR: (gcloud.alpha.genomics.pipelines.run) INVALID_ARGUMENT: Error: validating pipeline: zones and regions cannot be specified together
I since then have tried to specify the region and zone on separate lines and have even changed the default region and zone. I have even tried example pipelines from google themselves and they still result in the same error. Am I doing something wrong or is there just something more I need to install for this to work?
You need to use the --regions flag first and in the end the --zone flag. As workaround you can set the default zone and region to your local client. Also keep in mind that the region is "us-west1" and the zone is "b"
Related
I have a question. how to find all deployments in cluster with specific tag image.
I want something like that:
kubectl get deployment -A -o jsonpath='{range .items[*]}{.spec.template.spec.containers[*].image}{"\n"}{end}'
But with deployments names and with specific image tags.
I don't have a cluster available to try this.
You don't want to range but filter and I think (!?) you won't be able to use kubectl's JSONPath to:
filter only ${TAG} from image: ${REPO}:${TAG} but you can filter by the value of a field e.g. ${REPO}:${TAG}
Return the deployments' metadata.name values
IIRC you can't nest filter, so you can't item[?(#.spec.template.spec.containers[?(#.image=\"${IMAGE}\")].metadata.name
You can enumerate same-level fields e.g. container.name if that helps. I haven't tried this but:
IMAGE="..."
FILTER="{
.items[*].spec.template.spec.containers[?(#.image==\"${IMAGE}\")].name
}"
kubectl \
get deployments \
--all-namespaces \
--output=jsonpath="${FILTER}"
This may be better done with a tool like jq:
IMAGE="..."
FILTER="
.items[]|
select(.spec.template.spec.containers[].image==\"${IMAGE}\")
.metadata.name
"
kubectl \
get deployments \
--all-namespaces \
--output=json \
| jq -r "${FILTER}"
NOTE Using jq you can filter by ${TAG} too. I'll leave that exercise to you.
I'm working on a auto devops workflow only based on the dockerfile using Cloud Build on GCP, when I try to use the following command it seems is not using the flag: --dockerfile-image
gcloud beta builds triggers create cloud-source-repositories \
--name="test-trigger-2" \
--repo="projects/nodrize-dev/repos/b722166a-56e0-46af-bd0d-42af8d37c570/bf11672f-34d5-4d8c-80cb-31120f39251a/quirino-backend" \
--branch-pattern="^master$" \
--dockerfile="Dockerfile" \
--dockerfile-dir="" \
--dockerfile-image="gcr.io/nodrize-dev/test-backend"
Created [https://cloudbuild.googleapis.com/v1/projects/nodrize-dev/triggers/896f8ac8-397c-464a-84f7-43e69f1bc6cb].
NAME CREATE_TIME STATUS
test-trigger-2 2021-06-02T21:06:54+00:00
I want to create trigger to run it later but the last flag isnt working I asume is using the default or fallback, because as you can see in the image name is:
gcr.io/nodrize-dev/b722166a-56e0-46af-bd0d-42af8d37c570/bf11672f-34d5-4d8c-80cb-31120f39251a/quirino-backend:$COMMIT_SHA:
dockerimage-name in gcp concole:
I hope someone can help me or at least know what is happening.
This works for me.
I suspect perhaps that the trigger is incorrect or is not being triggered and|or the image is not what was generated by the trigger.
PROJECT=...
REPO=...
gcloud source repos create ${REPO} \
--project=${PROJECT}
gcloud beta builds triggers create cloud-source-repositories \
--name="trigger" \
--project=${PROJECT} \
--repo=${REPO} \
--branch-pattern="^master$" \
--dockerfile="Dockerfile" \
--dockerfile-dir="." \
--dockerfile-image="gcr.io/${PROJECT}/freddie-01"
NAME CREATE_TIME STATUS
trigger 2021-06-03T15:24:27+00:00
git push google master
gcloud builds list \
--project=${PROJECT} \
--format="value(images)"
gcr.io/${PROJECT}/freddie-01:7dcf74e126af711d24bb2b652d86f0d28bbe3bd9
gcloud container images list \
--project=${PROJECT}
NAME
gcr.io/${PROJECT}/freddie-01
I have a .sh script that lunches a submit training job as following:
now=$(date +"%Y%m%d_%H%M%S")
JOB_NAME="campign_retention_model__$now"
JOB_DIR="gs://machine_learning_datasets/campaign_retention"
REGION="us-east1"
PYTHON_VERSION='3.5'
RUNTIME_VERSION='1.12'
TRAINER_PACKAGE_PATH="./trainer/"
PACKAGE_STAGING_PATH="gs://machine_learning_datasets/campaign_retention"
CLOUDSDK_PYTHON="/usr/bin/python"
MAIN_TRAINER_MODULE="trainer.task"
gcloud ml-engine jobs submit training $JOB_NAME \
--job-dir $JOB_DIR \
--package-path $TRAINER_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE \
--region $REGION \
--runtime-version=$RUNTIME_VERSION \
--python-version=$PYTHON_VERSION \
Which works great (Notice that the .sh is located next to the trainer dir).
Due to external infra requirements, i was forced to save the content of my project within a bucket named:
"gs://campign_retention_code/camp_ret"
And hand out a stand alone sh, So I've just changed it to (just changed the path of TRAINER_PACKAGE_PATH):
now=$(date +"%Y%m%d_%H%M%S")
JOB_NAME="campign_retention_model__$now"
JOB_DIR="gs://machine_learning_datasets/campaign_retention"
REGION="us-east1"
PYTHON_VERSION='3.5'
RUNTIME_VERSION='1.12'
TRAINER_PACKAGE_PATH="gs://campign_retention_code/camp_ret/trainer"
PACKAGE_STAGING_PATH="gs://machine_learning_datasets/campaign_retention"
CLOUDSDK_PYTHON="/usr/bin/python"
MAIN_TRAINER_MODULE="trainer.task"
gcloud ml-engine jobs submit training $JOB_NAME \
--job-dir $JOB_DIR \
--package-path $TRAINER_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE \
--region $REGION \
--runtime-version=$RUNTIME_VERSION \
--python-version=$PYTHON_VERSION \
Now when i'm running it (I moved it to a different location on the desktop to /Users/yehoshaphatschellekens/Desktop, to make sure its not close to my project) i'm getting the following error:
ERROR: (gcloud.ml-engine.jobs.submit.training) Source directory [/Users/yehoshaphatschellekens/Desktop/camp_ret] is not a valid directory.
Looking at the docs packaging-trainer i noticed that there are two examples, one that works like my original script, which as i said, works perfectly, and another example that uses a packaged dependancy.
Why the submit job won't recognise my dependancies on gs, can't i just point to --package-path a directory from gs instead of my local dir?
Thanks in Advance!!!
I believe what you are trying to do requires using
--packages gs://path/to/packages
INSTEAD of --package-path
I am running the below code in GCP default VM and config.yaml also created with necessary fields. Although I am getting source directory is not a valid directory error.
gcloud ml-engine jobs submit training my_job \ --module-name
trainer.task \ --staging-bucket gs://my-bucket \ --package-path
/my/code/path/trainer \ --packages additional-dep1.tar.gz,dep2.whl
Have checked all the paths and they are ok and the data is within them however the command is not executing...
Help on above topic is much appreciated
Your Python package structure is probably missing an init.py. see http://python-packaging.readthedocs.io/en/latest/minimal.html
And verify it using setup.py sdist
Below script was running fine until yesterday morning.
gcloud ml-engine jobs submit training "$JOB_ID" \
--module-name trainer.task \
--package-path trainer \
--staging-bucket "$BUCKET" \
--region us-central1 \
--runtime_version=1.0 \
-- \
--output_path "${GCS_PATH}/training" \
--eval_data_paths "${GCS_PATH}/preproc/eval*" \
--train_data_paths "${GCS_PATH}/preproc/train*" \
--classification_type "multilabel" \
Running into below error:
ERROR: gcloud crashed (ArgumentError): argument USER_ARGS: unrecognized args: --runtime_version=1.0
The '--' argument must be specified between gcloud specific args on the left and USER_ARGS on the right.
Below are the gcloud components version:
$ gcloud version
Google Cloud SDK 147.0.0
alpha 2016.01.12
app-engine-go
app-engine-go-linux-x86_64 1.9.50
app-engine-java 1.9.50
app-engine-php " "
app-engine-python 1.9.50
beta 2016.01.12
bq 2.0.24
bq-nix 2.0.24
cloud-datastore-emulator 1.2.1
core 2017.03.13
alpha 2016.01.12
core-nix 2016.11.07
datalab 20170309
datalab-nix 20170105
gcd-emulator v1beta3-1.0.0
gcloud
gcloud-deps 2017.03.13
gcloud-deps-linux-x86_64 2017.02.21
gsutil 4.22
gsutil-nix 4.18
kubectl
kubectl-linux-x86_64 1.5.3
pubsub-emulator 2017.02.07
Not sure whether this is anything changed in Cloud, or I need check any config on my end that may cause this error.
You might need to use --runtime-version as the name of the argument (hyphen instead of underscore).
Without that, gcloud is assuming its some custom user-defined argument, which it expects to be in the list after the '--', hence the confusing error message.