Container OS metadata `user-data` metadata - google-cloud-platform

Is there a way to pass user-data flag as a remote script? Similar to startup-script-url ?
I have configured a yaml file using cloud-config and use --metadata-from-file user-data=config-basic.yaml flag to create a new VM with CoS.
I want to create VMs programmatically and a local file may not be accessible nor passing whole script content as user-data metadata property is feasible.
Option 1) Write cloud-config script as a shell script?
Option 2) Find logic that invokes cloud-config and populate it with metadata and insert contents there.
Option 3) A better option???
https://cloud.google.com/compute/docs/instances/startup-scripts/linux
gcloud compute instances create cos-vertex-gpu \
--image cos-101-17162-40-34 \
--image-project cos-cloud \
--boot-disk-size 100 \
--machine-type n1-standard-4 \
--zone us-west1-a \
--metadata="google-logging-enabled=true,google-monitoring-enabled=true" \
--metadata-from-file user-data=config-basic.yaml \
--maintenance-policy=TERMINATE \
--accelerator=type=nvidia-tesla-t4,count=1

I was not able to find a way to pass a remote: cloud-config programmatically.
I ended up re-writing my cloud-config as a shell script and pass it using startup-script-url

Related

GCP Cloud Logging Cost increasing with Dataproc img version 2.0.39-ubuntu18

I've a Dataproc cluster with image version - 2.0.39-ubuntu18, which seems to be putting all logs into Cloud Logging, this is increasing our costs a lot.
Here is the command used to create the cluster, i've added the following - spark:spark.eventLog.dir=gs://dataproc-spark-logs/joblogs,spark:spark.history.fs.logDirectory=gs://dataproc-spark-logs/joblogs
to stop using the Cloud Logging, however that is not working .. Logs are being re-directed to Cloud Logging as well.
Here is the command used to create the Dataproc cluster :
REGION=us-east1
ZONE=us-east1-b
IMG_VERSION=2.0-ubuntu18
NUM_WORKER=3
# in versa-sml-googl
gcloud beta dataproc clusters create $CNAME \
--enable-component-gateway \
--bucket $BUCKET \
--region $REGION \
--zone $ZONE \
--no-address --master-machine-type $TYPE \
--master-boot-disk-size 100 \
--master-boot-disk-type pd-ssd \
--num-workers $NUM_WORKER \
--worker-machine-type $TYPE \
--worker-boot-disk-type pd-ssd \
--worker-boot-disk-size 500 \
--image-version $IMG_VERSION \
--autoscaling-policy versa-dataproc-autoscaling \
--scopes 'https://www.googleapis.com/auth/cloud-platform' \
--project $PROJECT \
--initialization-actions 'gs://dataproc-spark-configs/pip_install.sh','gs://dataproc-spark-configs/connectors-feb1.sh' \
--metadata 'gcs-connector-version=2.0.0' \
--metadata 'bigquery-connector-version=1.2.0' \
--properties 'dataproc:dataproc.logging.stackdriver.job.driver.enable=true,dataproc:job.history.to-gcs.enabled=true,spark:spark.dynamicAllocation.enabled=false,spark:spark.executor.instances=6,spark:spark.executor.cores=2,spark:spark.eventLog.dir=gs://dataproc-spark-logs/joblogs,spark:spark.history.fs.logDirectory=gs://dataproc-spark-logs/joblogs,spark:spark.jars.packages=org.apache.spark:spark-sql-kafka-0-10_2.12:3.1.2'
We have another Dataproc cluster (image version 1.4.37-ubuntu18, similar configuration as the image version 2.0-ubuntu18), which has similar configuration but does not seem to using Cloud Logging as much.
Attached is screenshot properties of both the clusters.
What do i need to change to ensure the Dataproc jobs(pyspark) donot use the Cloud Logging ?
tia!
[
I saw dataproc:dataproc.logging.stackdriver.job.driver.enable is set to true. By default, the value is false, which means driver logs will be saved to GCS and streamed back to the client for viewing, but it won't be saved to Cloud Logging. You can try disabling it. BTW, when it is enabled, the job driver logs will be available in Cloud Logging under the job resource (instead of the cluster resource).
If you want to disable Cloud Logging completely for a cluster, you can either add dataproc:dataproc.logging.stackdriver.enable=false when creating the cluster or write an init action with systemctl stop google-fluentd.service. Both will stop Cloud Logging on the cluster's side, but using property is recommended.
See Dataproc cluster properties for the property.
Here is the update on this (based on discussions with GCP Support) :
In the GCP Logging, we need to create a Log Routing sink with inclusion filter - this will write the logs to BigQuery or Cloud Storage depending upon the target you specify.
Additionally, the _Default sink needs to be modified to add exclusion filters so specific logs will NOT be re-directed to GCP Logging
Attached are screenshots of the _Default log sink and the Inclusion sink for Dataproc.

Additional persistent disks are not created when using --source-machine-image with 'gcloud beta compute instances create' CLI command

The following works great - creating VM from source image and additional persistent disk(s).
gcloud compute instances create ${INSTANCE_NAME} \
--image-project ${PROJECT_NAME} \
--image ${BASE_IMAGE_NAME} \
--zone=${ZONE_NAME} \
--create-disk=size=128GB,type=pd-balanced,name=${INSTANCE_NAME}-home,device-name=homedisk
The following, however, creates a VM BUT no additional disk(s) are created.
gcloud beta compute instances create ${INSTANCE_NAME} \
--source-machine-image ${BASE_IMAGE_NAME} \
--zone=${ZONE_NAME} \
--create-disk=size=128GB,type=pd-balanced,name=${INSTANCE_NAME}-homedisk,device-name=homedisk
The documentation for the command does not suggest that --source-machine-image and --create-disk cannot work in tandem. The property overrides when creating a VM from machine image suggests that any of the properties can be overridden.
Any insights as to what might be going on?
the problem here is with the --source-machine-image ${BASE_IMAGE_NAME} flag because your BASE_IMAGE_NAME must already have the desired additional disk, that is why it is not being created, because it is creating everything from the BASE_IMAGE_NAME which does not have an additional disk, try it by creating a new Machine image with the desired additional disk attached and then run your gcloud beta compute instances create again (the second command you have) and confirm that it creates the instance based on that Machine image including the additional disk.
If you need to create a new instance with 1 additional disk you should use (your first command) --image ${NAME} --image-project ${PROJECT}
So --source-machine-image and --image ... --image-project are very different.
Here is the documentation for Machine images which may explain this better.
https://cloud.google.com/compute/docs/machine-images

mounting disk in GCP compute instance on Container Optimized OS [COS] created using gcloud console

So I am using this gcloud console command to create an instance from container image
gcloud compute instances create-with-container test-instance \
--zone us-xx \
--container-image asia.gcr.io/my-project/my-docker-image \
--container-privileged \
--network my-network \
--subnet my-net-sub \
--create-disk name=test-data,device-name=test-data,auto-delete=yes,size=200GB,type=pd-ssd \
--container-mount-disk name=test-data,mount-path=/mnt/disks/data \
--service-account me#myproject.iam.gserviceaccount.com
which works fine and creates the instance, but it does not mount the data-disk. why?
more precisely, to add the data disk I need to
create a single partition of the whole disk and create ext4 file system
and mount the disk on given path
How can I specify the create partition with ext4 and then mount the partition part?
You can't mount the host's disk to the container (use the same disk in both). You can however mount a directory or another disk. Either way you will be able to store data on it and both OS'es (host & container) will be able to read/write from it.
Let's say you want to store all data in the host OS disk in /datadir/ and you want it to be mounted inside the container under /mnt/disks/data. Below you will find a complete (and tested) example to use:
gcloud compute instances create-with-container mytestvm1 \
--zone=europe-west3-c \
--container-image=gcr.io/google-containers/mycontainer \
--container-privileged \
--network default \
--subnet default \
--create-disk name=test-data,device-name=test-data,auto-delete=yes,size=20GB,type=pd-ssd \
--container-mount-host-path=mount-path=/mnt/disks/data,host-path=/home/myhomedir/,mode=rw \
--service-account=my_service_account#developer.gserviceaccount.com
If you need another disk mounted then just change the line:
--container-mount-host-path=mount-path=/mnt/disks/data,host-path=/home/myhomedir/,mode=rw \
to
--container-mount-disk=mount-path=/mnt/disks/data,name=data1,mode=rw \

How to keep Google Dataproc master running?

I created a cluster on Dataproc and it works great. However, after the cluster is idle for a while (~90 min), the master node will automatically stops. This happens to every cluster I created. I see there is a similar question here: Keep running Dataproc Master node
It looks like it's the initialization action problem. However the post does not give me enough info to fix the issue. Below are the commands I used to create the cluster:
gcloud dataproc clusters create $CLUSTER_NAME \
--project $PROJECT \
--bucket $BUCKET \
--region $REGION \
--zone $ZONE \
--master-machine-type $MASTER_MACHINE_TYPE \
--master-boot-disk-size $MASTER_DISK_SIZE \
--worker-boot-disk-size $WORKER_DISK_SIZE \
--num-workers=$NUM_WORKERS \
--initialization-actions gs://dataproc-initialization-actions/connectors/connectors.sh,gs://dataproc-initialization-actions/datalab/datalab.sh \
--metadata gcs-connector-version=$GCS_CONNECTOR_VERSION \
--metadata bigquery-connector-version=$BQ_CONNECTOR_VERSION \
--scopes cloud-platform \
--metadata JUPYTER_CONDA_PACKAGES=numpy:scipy:pandas:scikit-learn \
--optional-components=ANACONDA,JUPYTER \
--image-version=1.3
I need the BigQuery connector, GCS connector, Jupyter and DataLab for my cluster.
How can I keep my master node running? Thank you.
As summarized in the comment thread, this is indeed caused by Datalab's auto-shutdown feature. There are a couple ways to change this behavior:
Upon first creating the Datalab-enabled Dataproc cluster, log in to Datalab and click on the "Idle timeout in about ..." text to disable it: https://cloud.google.com/datalab/docs/concepts/auto-shutdown#disabling_the_auto_shutdown_timer - The text will change to "Idle timeout is disabled"
Edit the initialization action to set the environment variable as suggested by yelsayed:
function run_datalab(){
if docker run -d --restart always --net=host -e "DATALAB_DISABLE_IDLE_TIMEOUT_PROCESS=true" \
-v "${DATALAB_DIR}:/content/datalab" ${VOLUME_FLAGS} datalab-pyspark; then
echo 'Cloud Datalab Jupyter server successfully deployed.'
else
err 'Failed to run Cloud Datalab'
fi
}
And use your custom initialization action instead of the stock gs://dataproc-initialization-actions one. It could be worth filing a tracking issue in the github repo for dataproc initialization actions too, suggesting to disable the timeout by default or provide an easy metadata-based option. It's probably true that the auto-shutdown behavior isn't as expected in default usage on a Dataproc cluster since the master is also performing roles other than running the Datalab service.

How to use environment variables in Compute Engine on Google Cloud Platform?

I have an application running in Compute Engine on Google Cloud Platform which reads system environmental variables.
I wonder what is the way to put them in my instance so that the application will be able to read them in runtime.
Here is how I create an instance:
gcloud compute instances create ${PROJECT_ID} \
--image-family debian-9 \
--image-project debian-cloud \
--machine-type g1-small \
--scopes "userinfo-email,cloud-platform" \
--metadata-from-file startup-script=${SCRIPT} \
--metadata release-url=${BUCKET_URL} \
--zone ${ZONE} \
--tags http-server
I have some security credentials, e.g. API keys, passwords, etc. which I want to upload to my instance and expose them as env vars to be read by my application.
Is there any console available for that, flag or command to automate this?
You can do it by connecting over SSH once you have created the instance.
It is explained in set default values in environment variables.
For example, use the export command to set the zone and region variables like:
$ export CLOUDSDK_COMPUTE_ZONE="us-central1-a"
$ export CLOUDSDK_COMPUTE_REGION="us-central1"
To make these environment variables permanent:
Alternative 1: Using bashrc file
include these export commands in your ~/.bashrc file
you can use nano or vim to put the variables
sudo nano ~/.bashrc
then restart your terminal and check
$ env
Alternative 2: Using start up script
You can also use the export command within a start up script to let your metadata to become the environment variables.
Upon creating your instance you may put it directly or via a file like this:
gcloud compute instances create vm-1 \
--metadata-from-file startup-script=$HOME/startup.sh \
--zone=us-west1-a
If the instance is already running, follow the instructions to set a startup script on a running instance .
Please remember that if you use the method of this start up script then you will need to run the script manually each time you set new variables.
Whatever method you choose, make sure your $ env setting is working correctly.
Better cek it again by restarting your instance within the shell or using the stop and start button in your console.