How to deploy dask-kubernetes adaptive cluster onto aws kubernetes instance - amazon-web-services

I am attempting to deploy an adaptive dask kubernetes cluster to my aws K8s instance (I want to use the kubeControl interface found here). It is unclear to me where and how I execute this code such that it is active on my existing cluster. In addition to this, I want to have an ingress rule such that another ec2 instance I have can connect to the cluster and execute code within an aws VPC to maintain security and network performance.
So far I have managed to get a functional k8s cluster running with dask and jupyterhub running on it. I am using the sample helm chart found here which reference the docker image here. I can see this image does not even install dask-kubernetes. With that being said, I am able to connect to this cluster from my other ec2 instance using the exposed AWS dns server and execute custom code but this is not the kubernetes native dask cluster.
I have worked on modifying the deploy yaml for kubernetes but it is unclear to me what I would need to change to have it use the proper kubernetes cluster/schedulers. I do know I need to modify the docker image I am using to have in install dask-kubernetes, but this still does not help me. Below is the sample helm deploy chart I am using
---
# nameOverride: dask
# fullnameOverride: dask
scheduler:
name: scheduler
image:
repository: "daskdev/dask"
tag: 2.3.0
pullPolicy: IfNotPresent
# See https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
pullSecrets:
# - name: regcred
replicas: 1
# serviceType: "ClusterIP"
# serviceType: "NodePort"
serviceType: "LoadBalancer"
servicePort: 8786
resources: {}
# limits:
# cpu: 1.8
# memory: 6G
# requests:
# cpu: 1.8
# memory: 6G
tolerations: []
nodeSelector: {}
affinity: {}
webUI:
name: webui
servicePort: 80
worker:
name: worker
image:
repository: "daskdev/dask"
tag: 2.3.0
pullPolicy: IfNotPresent
# dask_worker: "dask-cuda-worker"
dask_worker: "dask-worker"
pullSecrets:
# - name: regcred
replicas: 3
aptPackages: >-
default_resources: # overwritten by resource limits if they exist
cpu: 1
memory: "4GiB"
env:
# - name: EXTRA_CONDA_PACKAGES
# value: numba xarray -c conda-forge
# - name: EXTRA_PIP_PACKAGES
# value: s3fs dask-ml --upgrade
resources: {}
# limits:
# cpu: 1
# memory: 3G
# nvidia.com/gpu: 1
# requests:
# cpu: 1
# memory: 3G
# nvidia.com/gpu: 1
tolerations: []
nodeSelector: {}
affinity: {}
jupyter:
name: jupyter
enabled: true
image:
repository: "daskdev/dask-notebook"
tag: 2.3.0
pullPolicy: IfNotPresent
pullSecrets:
# - name: regcred
replicas: 1
# serviceType: "ClusterIP"
# serviceType: "NodePort"
serviceType: "LoadBalancer"
servicePort: 80
# This hash corresponds to the password 'dask'
password: 'sha1:aae8550c0a44:9507d45e087d5ee481a5ce9f4f16f37a0867318c'
env:
# - name: EXTRA_CONDA_PACKAGES
# value: "numba xarray -c conda-forge"
# - name: EXTRA_PIP_PACKAGES
# value: "s3fs dask-ml --upgrade"
resources: {}
# limits:
# cpu: 2
# memory: 6G
# requests:
# cpu: 2
# memory: 6G
tolerations: []
nodeSelector: {}
affinity: {}

To run a Dask cluster on Kubernetes there are three recommended approaches. Each of these approaches require you to have an existing Kubernetes cluster and credentials correctly configured (kubectl works locally).
Dask Helm Chart
You can deploy a standalone Dask cluster using the Dask helm chart.
helm repo add dask https://helm.dask.org/
helm repo update
helm install --name my-release dask/dask
Note that this is not an adaptive cluster but you can scale it by modifying the size of the deployment via kubectl.
kubectl scale deployment dask-worker --replicas=10
Helm Chart Documentation
Python dask-kubernetes API
You can also use dask-kubernetes which is a Python library for creating ad-hoc clusters on the fly.
pip install dask-kubernetes
from dask_kubernetes import KubeCluster
cluster = KubeCluster()
cluster.scale(10) # specify number of nodes explicitly
cluster.adapt(minimum=1, maximum=100) # or dynamically scale based on current workload
This will create a Dask cluster from scratch and will tear it down when the cluster object is garbage collected (most likely on exit).
dask-kubernetes Documentation
Dask Gateway
Dask Gateway provides a secure, multi-tenant server for managing Dask clusters.
To get started on Kubernetes you need to create a Helm configuration file (config.yaml) with a gateway proxy token.
gateway:
proxyToken: "<RANDOM TOKEN>"
Hint: You can generate a suitable token with openssl rand -hex 32.
Then install the chart.
helm repo add dask-gateway https://dask.org/dask-gateway-helm-repo/
helm repo update
helm install --values config.yaml my-release dask-gateway/dask-gateway
Dask Gateway Documentation

Related

How do you specify a custom service.yaml for cloud run in a cloudbuild.yaml?

I have a docker container deployed on Googles Cloud Run service. It has a very basic cloudbuild.yaml file that triggers from a git push to main branch.
I wish to automatically increase the ram of the cloud run machine from 512mb to 8gb. I know this is possible in the Cloud Run UI by clicking "EDIT # DEPLOY NEW REVISION" and then manually selecting 8gb. But I would like to have this setup automatically.
You can fetch the .yaml from Cloud Run by:
gcloud run services describe SERVICE --format export > service.yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
annotations:
client.knative.dev/user-image: 'gcr.io/project/service:ebbe555'
run.googleapis.com/ingress: all
run.googleapis.com/ingress-status: all
run.googleapis.com/launch-stage: BETA
labels:
cloud.googleapis.com/location: europe-north1
name: service
namespace: '467851153648'
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/maxScale: '100'
autoscaling.knative.dev/minScale: '1'
client.knative.dev/user-image: 'gcr.io/project/service:ebbe555'
run.googleapis.com/client-name: gcloud
run.googleapis.com/client-version: 378.0.0
run.googleapis.com/execution-environment: gen2
name: faq-engine-00004-vov
spec:
containerConcurrency: 80
containers:
- image: 'gcr.io/project/service:ebbe555'
ports:
- containerPort: 8081
name: http1
resources:
limits:
cpu: 4000m
memory: 8Gi
serviceAccountName: service#project.iam.gserviceaccount.com
timeoutSeconds: 300
traffic:
- latestRevision: true
percent: 100
And you can replace the current .yaml semi automatically with:
gcloud run services replace service.yaml
However, is there any way to make the actual Cloud Build load the custom service.yaml in the Deploy container image to Cloud Run step?
cloudbuild.yaml
timeout: 1800s
substitutions:
_SERVICE_NAME: service
_REGION: europe-north1
images:
- 'gcr.io/${PROJECT_ID}/${_SERVICE_NAME}:${SHORT_SHA}'
options:
machineType: N1_HIGHCPU_32
dynamic_substitutions: true
steps:
- id: Build the container image
name: gcr.io/cloud-builders/docker
args:
- build
- '-t'
- 'gcr.io/${PROJECT_ID}/${_SERVICE_NAME}:${SHORT_SHA}'
- .
- id: Push the container image to Container Registry
name: gcr.io/cloud-builders/docker
args:
- push
- 'gcr.io/${PROJECT_ID}/${_SERVICE_NAME}:${SHORT_SHA}'
- id: Deploy container image to Cloud Run
name: gcr.io/google.com/cloudsdktool/cloud-sdk
entrypoint: gcloud
args:
- run
- deploy
- '${_SERVICE_NAME}'
- '--platform'
- managed
- '--region'
- '${_REGION}'
- '--allow-unauthenticated'
- '--service-account'
- '${_SERVICE_NAME}#${PROJECT_ID}.iam.gserviceaccount.com'
- '--image'
- 'gcr.io/${PROJECT_ID}/${_SERVICE_NAME}:${SHORT_SHA}'
Thanks!
Posting Comments from #GuillaumeBlaqueire and #Lsbister as a community wiki for increased visibility:
To deploy a Cloud Run service, use either the YAML (service.yaml) with gcloud run services replace OR the gcloud command gcloud run deploy. You can't use the service YAML with the "deploy" action.
If you only want to set the memory of your container to 8GBi using the deploy command, you should use the flag --memory for that.

GKE how to use existing compute engine disk as persistent volumes?

I might have to rebuild the GKE cluster but the compute engine disks won't be delete and needs to be re-used as persistent volumes for the pods. I haven't found a documentation showing how to link the existing GCP compute engine disk as persistent volumes for the pods.
Is it possible to use the existing GCP compute engine disks with GKE storage class and Persistent volumes?
Yes, it's possible to reuse Persistent Disk as Persistent Volume for another clusters, however there is one limitation:
The persistent disk must be in the same zone as the cluster nodes.
If PD will be in a different zone, the cluster will not find this disk.
In Documentation Using preexisting persistent disks as PersistentVolumes you can find information and examples how to reuse persistent disks.
If you didn't create Persistent Disk yet, you can create it based on Creating and attaching a disk documentation. For this tests, I've used below disk:
gcloud compute disks create pd-name \
--size 10G \
--type pd-standard \
--zone europe-west3-b
If you will create PD with less than 200G you will get below Warning, everything depends on your needs. In zone europe-west3-b, pd-standard type can have storage between 10GB - 65536GB.
You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/com
pute/docs/disks#performance.
Keep in mind that you might get different types of Persistent Disk on different zones. For more details you can check Disk Types documentation or run $ gcloud compute disk-types list.
Once you have Persistent Disk you can create PersistentVolume and PersistentVolumeClaim.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv
spec:
storageClassName: "test"
capacity:
storage: 10G
accessModes:
- ReadWriteOnce
claimRef:
namespace: default
name: pv-claim
gcePersistentDisk:
pdName: pd-name
fsType: ext4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim
spec:
storageClassName: "test"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10G
---
kind: Pod
apiVersion: v1
metadata:
name: task-pv-pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/data"
name: task-pv-storage
Tests
$ kubectl get pv,pvc,pod
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pv 10G RWO Retain Bound default/pv-claim test 22s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/pv-claim Bound pv 10G RWO test 22s
NAME READY STATUS RESTARTS AGE
pod/task-pv-pod 1/1 Running 0 21s
Write some information to disk
$ kubectl exec -ti task-pv-pod -- bin/bash
root#task-pv-pod:/# cd /usr/share/nginx/html
root#task-pv-pod:/usr/share/nginx/html# echo "This is test message from Nginx pod" >> message.txt
Now I removed all previous resources: pv, pvc and pod.
$ kubectl get pv,pvc,pod
No resources found
Now If I would recreate pv, pvc with small changes in pod, for example busybox.
containers:
- name: busybox
image: busybox
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
volumeMounts:
- mountPath: "/usr/data"
name: task-pv-storage
It will be rebound
$ kubectl get pv,pvc,po
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pv 10G RWO Retain Bound default/pv-claim 43m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/pv-claim Bound pv 10G RWO 43m
NAME READY STATUS RESTARTS AGE
pod/busybox 1/1 Running 0 3m43s
And in the busybox pod I will be able to find Message.txt.
$ kubectl exec -ti busybox -- bin/sh
/ # cd usr
/ # cd usr/data
/usr/data # ls
lost+found message.txt
/usr/data # cat message.txt
This is test message from Nginx pod
As additional information, you won't be able to use it in 2 clusters in the same time, if you would try you will get error:
AttachVolume.Attach failed for volume "pv" : googleapi: Error 400: RESOURCE_IN_USE_B
Y_ANOTHER_RESOURCE - The disk resource 'projects/<myproject>/zones/europe-west3-b/disks/pd-name' is already being used by 'projects/<myproject>/zones/europe-west3-b/instances/gke-cluster-3-default-pool-bb545f05-t5hc'

How to get AWS kops based kubernetes cluster IP address to connect with gitlab CICD pipeline

I am trying to create basic gitlab CICD pipeline which will deploy my node.js based backend to AWS kops based k8s cluster.For that I have created gitlab-ci.yml file which will use for deploy whole CICD pipeline, however I am getting confused with how to get kubernetes cluster IP address so I can use it in gitlab-ci.yml to set as - kubectl config set-cluster k8s --server="$CLUSTER_ADDRESS"
where I want CLUSTER_ADDRESS to configure with gitlab in gitlab-ci.yml.
Any help would be appreciated.
variables:
DOCKER_DRIVER: overlay2
REGISTRY: $CI_REGISTRY
IMAGE_TAG: $CI_REGISTRY_IMAGE
K8S_DEPLOYMENT_NAME: deployment/$CI_PROJECT_NAME
CONTAINER_NAME: $CI_PROJECT_NAME
stages:
- build
- build-docker
- deploy
build-docker:
image: docker:latest
stage: build-docker
services:
- docker:dind
tags:
- privileged
only:
- Test
script:
script:
- docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $REGISTRY
- docker build --network host -t $IMAGE_NAME:$IMAGE_TAG -t $IMAGE_NAME:latest .
- docker push $IMAGE_NAME:$IMAGE_TAG
- docker push $IMAGE_NAME:latest
deploy-k8s-(stage):
image:
name: kubectl:latest
entrypoint: [""]
stage: deploy
tags:
- privileged
# Optional: Manual gate
when: manual
dependencies:
- build-docker
script:
- kubectl config set-cluster k8s --server="$CLUSTER_ADDRESS"
- kubectl config set clusters.k8s.certificate-authority-data $CA_AUTH_DATA
- kubectl config set-credentials gitlab-service-account --token=$K8S_TOKEN
- kubectl config set-context default --cluster=k8s --user=gitlab-service-account --namespace=default
- kubectl config use-context default
- kubectl set image $K8S_DEPLOYMENT_NAME $CI_PROJECT_NAME=$IMAGE_TAG
- kubectl rollout restart $K8S_DEPLOYMENT_NAME
If your current kubeconfig context is set to the cluster in question, you can run the following to get the cluster address you want:
kubectl config view --minify --raw \
--output 'jsonpath={.clusters[0].cluster.server}'
You can add --context <cluster name> if not.
In most cases this will be https://api.<cluster name>.

Unable to deploy the image in the kubernetes (AWS)

I am stuck in the last moment , cannot figure out the mistake , everything is working fine , but while deploying the image on the cluster getting the error:
The image is in the docker hub , from the aws , i used docker login , provided the credential also .
sudo kops validate cluster --state=s3://kops-storage-54321 -o yaml
output :
Using cluster from kubectl context: tests.k8s.local
nodes:
- hostname: ip-172-20-40-124.us-east-2.compute.internal
name: ip-172-20-40-124.us-east-2.compute.internal
role: master
status: "True"
zone: us-east-2a
- hostname: ip-172-20-112-165.us-east-2.compute.internal
name: ip-172-20-112-165.us-east-2.compute.internal
role: node
status: "True"
zone: us-east-2c
- hostname: ip-172-20-60-168.us-east-2.compute.internal
name: ip-172-20-60-168.us-east-2.compute.internal
role: node
status: "True"
zone: us-east-2a
Docker Login :
sudo docker login
Authenticating with existing credentials...
WARNING! Your password will be stored unencrypted in /home/ubuntu/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
while deploying the image getting the error:
Command:
ubuntu#ip-172-31-30-176:~$ sudo kubectl create deployment magicalnginx --image=amitranjan007/magicalnginx
Error:
error: no matches for extensions/, Kind=Deployment
You can check which apis support current Kubernetes object using
$ kubectl api-resources | grep deployment
deployments deploy apps true Deployment
This means that only apiVersion with apps is correct for Deployments (extensions is not supporting Deployment) from kubernetes version 1.16.
Change apiVersion to apps/v1 in deployment yaml.

vsystem-vrep of vora at Waiting: CrashLoopBackOff

Trying to setup Vora 2 on an AWS kops k8s cluster.
The pod vsystem-vrep cannot start.
In the logfile on the node I see:
sudo cat vsystem-vrep_30.log
{"log":"2018-03-27 12:54:04.164349|+0000|INFO |Starting Kernel NFS Server||vrep|1|Start|server.go(41)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.164897827Z"}
{"log":"2018-03-27 12:54:04.164405|+0000|INFO |Creating directory /exports||dir-handler|1|makeDir|dir_handler.go(40)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.164919387Z"}
{"log":"2018-03-27 12:54:04.164423|+0000|INFO |Listening for private API on port 8738||vrep|18|func1|server.go(45)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.164923893Z"}
{"log":"2018-03-27 12:54:04.166992|+0000|INFO |Configuring Kernel NFS Server||vrep|1|configure|server.go(126)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.167109138Z"}
{"log":"2018-03-27 12:54:04.219089|+0000|INFO |Configuring Kernel NFS Server||vrep|1|configure|server.go(126)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.219235263Z"}
{"log":"2018-03-27 12:54:04.230256|+0000|FATAL|Error starting NFS server: RPC service for NFS server has not been correctly registered||vrep|1|main|server.go(51)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.230526346Z"}
How can I solve this?
When installing Vora 2.1 in AWS with kops, you need to first setup a RWX storage class which is needed by vsystem (the default AWS storage class is read only). During installation, you need to point to that storage class using parameter --vsystem-storage-class. Additionally, parameter --vsystem-load-nfs-modules needs to be set. I suspect that the error happened because that last parameter was missing.
Example, how a call of install.sh would look like:
./install.sh --accept-license --deployment-type=cloud --namespace=xxx
--docker-registry=123456789.dkr.ecr.us-west-1.amazonaws.com
--vora-admin-username=xxx --vora-admin-password=xxx
--cert-domain=my.host.domain.com --interactive-security-configuration=no
--vsystem-storage-class=aws-efs --vsystem-load-nfs-modules
A RWX storage class can e.g. be created as following
Create an EFS file system in same region as kops cluster - see https://us-west-2.console.aws.amazon.com/efs/home?region=us-west-2#/filesystems
Create file system
Select VPC of kops cluster
Add kops master and worker security groups to mount target
Optionally give it a name (e.g. same as your kops cluster, to know what it is used for)
Use default options for the remaining
Once created, note the DNS name (similar to fs-1234e567.efs.us-west-2.amazonaws.com).
Create persistent volume and storage class for Vora
E.g. use yaml files similar to below and point to the newly created EFS file system.
$ cat create_pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: vsystem-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: aws-efs
nfs:
path: /
server: fs-1234e567.efs.us-west-2.amazonaws.com
$ cat create_sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: aws-efs
provisioner: xyz.com/aws-efs
kubectl create -f create_pv.yaml
kubectl create -f create_sc.yaml
-- check if newly created pv and sc exist
kubectl get pv
kubectl get storageclasses