Cannot create gcloud instance - google-cloud-platform

Following https://course.fast.ai/start_gcp.html this set up:
export IMAGE_FAMILY="pytorch-latest-gpu" # or "pytorch-latest-cpu"
for non-GPU instances
export ZONE="us-west2-b" # budget: "us-west1-b"
export INSTANCE_NAME="my-fastai-instance"
export INSTANCE_TYPE="n1-highmem-8" # budget: "n1-highmem-4"
# budget: 'type=nvidia-tesla-k80,count=1'
gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=$IMAGE_FAMILY \
--image-project=deeplearning-platform-release \
--maintenance-policy=TERMINATE \
--accelerator="type=nvidia-tesla-p100,count=1" \
--machine-type=$INSTANCE_TYPE \
--boot-disk-size=200GB \
--metadata="install-nvidia-driver=True" \
--preemptible
Got this error:
(gcloud.compute.instances.create) Could not fetch resource:
- The resource 'projects/xxxxxx/zones/us-west2-b/acceleratorTypes/nvidia-tesla-p100' was not found
Anyone?

I tried replicating the same steps you followed from the tutorial and got the same error.
According to Google's documentation, NVIDIA-TESLA-P100 is only available in these zones:
us-west1-a
us-west1-b
us-central1-c
us-central1-f
us-east1-b
us-east1-c
europe-west1-b
europe-west1-d
europe-west4-a
asia-east1-a
asia-east1-c
australia-southeast1-c
And you may have selected us-west2-b, which is not available.
Therefore, I would just change your zone to one of the previously mentioned ones.
To get this list in a more programmatic way, using Cloud SDK for example, you could issue:
gcloud compute accelerator-types list --filter "name=nvidia-tesla-p100" --format "table[box,title=Zones](zone:sort=1)" 2>/dev/null

The error you are reporting is caused because this GPU is not available in the zone “us-west2-b”, you can review where GPU you can use in this official documentation.
In this case, according at the region you are using, you can use in:
us-west1-a
us-west1-b
Regards.

Related

GCP Cloud Logging Cost increasing with Dataproc img version 2.0.39-ubuntu18

I've a Dataproc cluster with image version - 2.0.39-ubuntu18, which seems to be putting all logs into Cloud Logging, this is increasing our costs a lot.
Here is the command used to create the cluster, i've added the following - spark:spark.eventLog.dir=gs://dataproc-spark-logs/joblogs,spark:spark.history.fs.logDirectory=gs://dataproc-spark-logs/joblogs
to stop using the Cloud Logging, however that is not working .. Logs are being re-directed to Cloud Logging as well.
Here is the command used to create the Dataproc cluster :
REGION=us-east1
ZONE=us-east1-b
IMG_VERSION=2.0-ubuntu18
NUM_WORKER=3
# in versa-sml-googl
gcloud beta dataproc clusters create $CNAME \
--enable-component-gateway \
--bucket $BUCKET \
--region $REGION \
--zone $ZONE \
--no-address --master-machine-type $TYPE \
--master-boot-disk-size 100 \
--master-boot-disk-type pd-ssd \
--num-workers $NUM_WORKER \
--worker-machine-type $TYPE \
--worker-boot-disk-type pd-ssd \
--worker-boot-disk-size 500 \
--image-version $IMG_VERSION \
--autoscaling-policy versa-dataproc-autoscaling \
--scopes 'https://www.googleapis.com/auth/cloud-platform' \
--project $PROJECT \
--initialization-actions 'gs://dataproc-spark-configs/pip_install.sh','gs://dataproc-spark-configs/connectors-feb1.sh' \
--metadata 'gcs-connector-version=2.0.0' \
--metadata 'bigquery-connector-version=1.2.0' \
--properties 'dataproc:dataproc.logging.stackdriver.job.driver.enable=true,dataproc:job.history.to-gcs.enabled=true,spark:spark.dynamicAllocation.enabled=false,spark:spark.executor.instances=6,spark:spark.executor.cores=2,spark:spark.eventLog.dir=gs://dataproc-spark-logs/joblogs,spark:spark.history.fs.logDirectory=gs://dataproc-spark-logs/joblogs,spark:spark.jars.packages=org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2'
We have another Dataproc cluster (image version 1.4.37-ubuntu18, similar configuration as the image version 2.0-ubuntu18), which has similar configuration but does not seem to using Cloud Logging as much.
Attached is screenshot properties of both the clusters.
What do i need to change to ensure the Dataproc jobs(pyspark) donot use the Cloud Logging ?
tia!
[
I saw dataproc:dataproc.logging.stackdriver.job.driver.enable is set to true. By default, the value is false, which means driver logs will be saved to GCS and streamed back to the client for viewing, but it won't be saved to Cloud Logging. You can try disabling it. BTW, when it is enabled, the job driver logs will be available in Cloud Logging under the job resource (instead of the cluster resource).
If you want to disable Cloud Logging completely for a cluster, you can either add dataproc:dataproc.logging.stackdriver.enable=false when creating the cluster or write an init action with systemctl stop google-fluentd.service. Both will stop Cloud Logging on the cluster's side, but using property is recommended.
See Dataproc cluster properties for the property.
Here is the update on this (based on discussions with GCP Support) :
In the GCP Logging, we need to create a Log Routing sink with inclusion filter - this will write the logs to BigQuery or Cloud Storage depending upon the target you specify.
Additionally, the _Default sink needs to be modified to add exclusion filters so specific logs will NOT be re-directed to GCP Logging
Attached are screenshots of the _Default log sink and the Inclusion sink for Dataproc.

creating dataproc cluster with multiple jars

I am trying to create a dataproc cluster that will connect dataproc to pubsub. I need to add multiple jars on cluster creation in the spark.jars flag
gcloud dataproc clusters create cluster-2c76 --region us-central1 --zone us-central1-f --master-machine-type n1-standard-4 \
--master-boot-disk-size 500 \
--num-workers 2 \
--worker-machine-type n1-standard-4 \
--worker-boot-disk-size 500 \
--image-version 1.4-debian10 \
--properties spark:spark.jars=gs://bucket/jars/spark-streaming-pubsub_2.11-2.4.0.jar,gs://bucket/jars/google-oauth-client-1.31.0.jar,gs://bucket/jars/google-cloud-datastore-2.2.0.jar,gs://bucket/jars/pubsublite-spark-sql-streaming-0.2.0.jar spark:spark.driver.memory=3000m \
--initialization-actions gs://goog-dataproc-initialization-actions-us-central1/connectors/connectors.sh \
--metadata spark-bigquery-connector-version=0.21.0 \
--scopes=pubsub,datastore
I get thrown this error
ERROR: (gcloud.dataproc.clusters.create) argument --properties: Bad syntax for dict arg: [gs://gregalr/jars/spark-streaming-pubsub_2.11-2.3.4.jar]. Please see `gcloud topic flags-file` or `gcloud topic escaping` for information on providing list or dictionary flag values with special characters.
This looked promising, but fails
If there is a better way to connect dataproc to pubsub, please share
The answer you linked is the correct way to do it: How can I include additional jars when starting a Google DataProc cluster to use with Jupyter notebooks?
If you also post the command you tried with the escaping syntax and the resulting error message then others could more easily verify what you did wrong. It looks like you're specifying an additional spark property in addition to your list of jars spark:spark.driver.memory=3000m, and tried to just space-separate that from your jars flag, which isn't allowed.
Per the linked result, you'd need to use the newly assigned separator character to separate the second spark property:
--properties=^#^spark:spark.jars.packages=artifact1,artifact2,artifact3#spark:spark.driver.memory=3000m

Google Cloud Functions with VPC Serverless Connector Egress with Cloud NAT not working

This is related to the following questions, which are outdated
Possible to get static IP address for Google Cloud Functions?
Google Cloud - Egress IP / NAT / Proxy for google cloud functions
Currently GCP has VPC Serverless Connector that allows you to route all traffic through a VPC Connector and set up Cloud NAT to get static IP addresses.
I have followed the following guide https://cloud.google.com/functions/docs/networking/network-settings#associate-static-ip using the region us-east4 but external requests from my cloud function always timed out.
I'm not sure this is a bug or I have missed something.
Edit:
To make sure I have followed everything, I did all the steps using gcloud, command where possible. These commands are copied from the guides from GCP.
Setting project id for future use
PROJECT_ID=my-test-gcf-vpc-nat
Go to Console and enable billing
Set up a VPC and a test VM to test Cloud NAT
gcloud services enable compute.googleapis.com \
--project $PROJECT_ID
gcloud compute networks create custom-network1 \
--subnet-mode custom \
--project $PROJECT_ID
gcloud compute networks subnets create subnet-us-east-192 \
--network custom-network1 \
--region us-east4 \
--range 192.168.1.0/24 \
--project $PROJECT_ID
gcloud compute instances create nat-test-1 \
--image-family debian-9 \
--image-project debian-cloud \
--network custom-network1 \
--subnet subnet-us-east-192 \
--zone us-east4-c \
--no-address \
--project $PROJECT_ID
gcloud compute firewall-rules create allow-ssh \
--network custom-network1 \
--source-ranges 35.235.240.0/20 \
--allow tcp:22 \
--project $PROJECT_ID
Created IAP SSH permissions using Console
Test network config, the VM should not have internet access without Cloud NAT
gcloud compute ssh nat-test-1 \
--zone us-east4-c \
--command "curl -s ifconfig.io" \
--tunnel-through-iap \
--project $PROJECT_ID
command responded with connection timed out
Set up Cloud NAT
gcloud compute routers create nat-router \
--network custom-network1 \
--region us-east4 \
--project $PROJECT_ID
gcloud compute routers nats create nat-config \
--router-region us-east4 \
--router nat-router \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-external-ips \
--project $PROJECT_ID
Test network config again, the VM should have internet access with Cloud NAT
gcloud compute ssh nat-test-1 \
--zone us-east4-c \
--command "curl -s ifconfig.io" \
--tunnel-through-iap \
--project $PROJECT_ID
command responded with IP address
Created VPC Access Connector
gcloud services enable vpcaccess.googleapis.com \
--project $PROJECT_ID
gcloud compute networks vpc-access connectors create custom-network1-us-east4 \
--network custom-network1 \
--region us-east4 \
--range 10.8.0.0/28 \
--project $PROJECT_ID
gcloud compute networks vpc-access connectors describe custom-network1-us-east4 \
--region us-east4 \
--project $PROJECT_ID
Added permissions for Google Cloud Functions Service Account
gcloud services enable cloudfunctions.googleapis.com \
--project $PROJECT_ID
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member=serviceAccount:service-$PROJECT_NUMBER#gcf-admin-robot.iam.gserviceaccount.com \
--role=roles/viewer
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member=serviceAccount:service-$PROJECT_NUMBER#gcf-admin-robot.iam.gserviceaccount.com \
--role=roles/compute.networkUser
There are suggestions I should add additional firewall rules and service account permissions
# Additional Firewall Rules
gcloud compute firewall-rules create custom-network1-allow-http \
--network custom-network1 \
--source-ranges 0.0.0.0/0 \
--allow tcp:80 \
--project $PROJECT_ID
gcloud compute firewall-rules create custom-network1-allow-https \
--network custom-network1 \
--source-ranges 0.0.0.0/0 \
--allow tcp:443 \
--project $PROJECT_ID
# Additional Permission, actually this service account has an Editor role already.
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member=serviceAccount:$PROJECT_ID#appspot.gserviceaccount.com \
--role=roles/compute.networkUser
Deployed test Cloud Functions
index.js
const publicIp = require('public-ip')
exports.testVPC = async (req, res) => {
const v4 = await publicIp.v4()
const v6 = await publicIp.v6()
console.log('ip', [v4, v6])
return res.end(JSON.stringify([v4, v6]))
}
exports.testNoVPC = exports.testVPC
# Cloud Function with VPC Connector
gcloud functions deploy testVPC \
--runtime nodejs10 \
--trigger-http \
--vpc-connector custom-network1-us-east4 \
--egress-settings all \
--region us-east4 \
--allow-unauthenticated \
--project $PROJECT_ID
# Cloud Function without VPC Connector
gcloud functions deploy testNoVPC \
--runtime nodejs10 \
--trigger-http \
--region us-east4 \
--allow-unauthenticated \
--project $PROJECT_ID
The Cloud Function without VPC Connector responded with IP address
https://us-east4-my-test-gcf-vpc-nat.cloudfunctions.net/testNoVPC
The Cloud Function with VPC Connector timed out
https://us-east4-my-test-gcf-vpc-nat.cloudfunctions.net/testVPC
Configure a sample Cloud NAT setup with Compute Engine. Use the Compute Engine to test if your settings for Cloud NAT were done successfully.
Configuring Serverless VPC Access. Make sure you create the VPC connector on the custom-network1 made in step 1.
Create a Google Cloud Function
a.Under Networking choose the connector you created on step 2 and Route all traffic through the VPC connector.
import requests
import json
from flask import escape
def hello_http(request):
response = requests.get('https://stackoverflow.com')
print(response.headers)
return 'Accessing stackoverflow from cloud function: {}!'.format(response.headers)
The Region for Cloud Nat, Vpc Connector and Cloud Function is us-central1
4.Test the function to see if you have access to internet:
Accessing stackoverflow from cloud function: {'Cache-Control': 'private', 'Content-Type': 'text/html; charset=utf-8', 'Content-Encoding': 'gzip', 'X-Frame-Options': 'SAMEORIGIN', 'X-Request-Guid': 'edf3d1f8-7466-4161-8170-ae4d6e615d5c', 'Strict-Transport-Security': 'max-age=15552000', 'Feature-Policy': "microphone 'none'; speaker 'none'", 'Content-Security-Policy': "upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com", 'Content-Length': '26391', 'Accept-Ranges': 'bytes', 'Date': 'Sat, 28 Mar 2020 19:03:17 GMT', 'Via': '1.1 varnish', 'Connection': 'keep-alive', 'X-Served-By': 'cache-mdw17354-MDW', 'X-Cache': 'MISS', 'X-Cache-Hits': '0', 'X-Timer': 'S1585422197.002185,VS0,VE37', 'Vary': 'Accept-Encoding,Fastly-SSL', 'X-DNS-Prefetch-Control': 'off', 'Set-Cookie': 'prov=78ecd1a5-54ea-ab1d-6d19-2cf5dc44a86b; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly'}!
Success, now you can specify a static IP address for NAT
Check if the cloud nat routers were created in the same VPC used by the Serverless VPC Access.
Also check if the Cloud Function is deployed in the same region of the Cloud Routers used by the Cloud Nat.

GCP - how to change the IP on a VM with 2 NICs?

I have a VM with 2 NICs. For all intents and purposes, it's a VPN server that takes connection requests on one interface and then forwards traffic out to the other interface.
Periodically, I need to change the IP on the second interface, which is easily done via the web interface. I'd like to make this change using GCP scripting tools to make the process less manual.
I have managed to automate all steps except updating the access-config. This is because both interfaces have the same access-config name ("External NAT"). I've been unable to find a way to rename or recreate this access-config name, nor have I found any workaround.
Any input would be greatly appreciated.
- accessConfigs:
- kind: compute#accessConfig
name: External NAT
natIP: ##.##.##.##
networkTier: STANDARD
type: ONE_TO_ONE_NAT
fingerprint: ==========
kind: compute#networkInterface
name: nic0
network: https://www.googleapis.com/compute/v1/projects/#######/global/networks/inbound
networkIP: 10.#.#.#
subnetwork: https://www.googleapis.com/compute/v1/projects/#######/regions/northamerica-northeast1/subnetworks/inbound
- accessConfigs:
- kind: compute#accessConfig
name: External NAT
natIP: ##.##.##.##
networkTier: STANDARD
type: ONE_TO_ONE_NAT
fingerprint: =========
kind: compute#networkInterface
name: nic1
network: https://www.googleapis.com/compute/v1/projects/#######/global/networks/outbound
networkIP: 10.0.2.3
subnetwork: https://www.googleapis.com/compute/v1/projects/#######/regions/northamerica-northeast1/subnetworks/outbound
I believe (!?) [really am not certain] that you must delete and then create; you can't update an existing access config to change the IP using gcloud.
Someone else please confirm!
PLEASE try this on a sacrificial instance before you use it on the production instance
Thus:
PROJECT=[[YOUR-PROJECT]]
ZONE=[[YOUR-ZONE]]
INSTANCE=[[YOUR-INSTANCE]]
INTERFACE=[[YOUR-INTERFACE]] # Perhaps "nic1"
# Show what we have currently
gcloud compute instances describe ${INSTANCE} \
--zone=${ZONE} --project=${PROJECT} \
--format="yaml(networkInterfaces)"
# Delete the "External NAT" for ${INTERFACE}
gcloud compute instances delete-access-config instance-1 \
--zone=${ZONE} --project=${PROJECT} \
--network-interface=${INTERFACE} \
--access-config-name="External NAT"
# Show what we have currently **without** "External NAT" for ${INTERFACE}
gcloud compute instances describe ${INSTANCE} \
--zone=${ZONE} --project=${PROJECT} \
--format="yaml(networkInterfaces)"
# Create a new "External NAT" for ${INTERFACE}
# Include --address=ADDRESS if you have one
gcloud compute instances add-access-config ${INSTANCE} \
--zone=${ZONE} --project=${PROJECT} \
--network-interface=${INTERFACE} \
--access-config-name="External NAT"
# Show what we have currently with a **new** "External NAT" for ${INTERFACE}
gcloud compute instances describe ${INSTANCE} \
--zone=${ZONE} --project=${PROJECT} \
--format="yaml(networkInterfaces)"
Update
This was bugging me.
You can filter in the describe commands by ${INTERACE} value:
gcloud compute instances describe ${INSTANCE} \
--zone=${ZONE} --project=${PROJECT} \
--format="yaml(networkInterfaces[].filter(name:${INTERFACE})"
Because gcloud has proprietary filtering|formatting, it's often better to format as JSON and then use jq. Using jq, we can filter by ${INTERFACE} and return only the 'External NAT` access config:
gcloud compute instances describe ${INSTANCE} \
--zone=${ZONE} --project=${PROJECT} \
--format="json" \
jq -r ".networkInterfaces[]|select(.name==\"${INTERFACE}\")|.accessConfigs[]|select(.name==\"External NAT\")"

How to specify preemptible GPU Deep Learning Virtual Machine on GCP

I can't figure out how to specify preemptible GPU Deep Learning VM on GCP
This what I used:
export IMAGE_FAMILY="tf-latest-gpu"
export ZONE="europe-west4-a "
export INSTANCE_NAME="deeplearning"
gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=$IMAGE_FAMILY \
--image-project=deeplearning-platform-release \
--maintenance-policy=TERMINATE \
--accelerator='type=nvidia-tesla-v100,count=2' \
--metadata='install-nvidia-driver=True'
Thank you!
You can create a preemptible Compute Engine instance with GPU by adding the --preemptible gcloud command option. As per your example, that would be:
export IMAGE_FAMILY="tf-latest-gpu"
export ZONE="europe-west4-a "
export INSTANCE_NAME="deeplearning"
gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=$IMAGE_FAMILY \
--image-project=deeplearning-platform-release \
--maintenance-policy=TERMINATE \
--accelerator type=nvidia-tesla-v100,count=2 \
--metadata='install-nvidia-driver=True'
--preemptible
See documentation here and here for more details on available options.