I have a k8s deployment which consists of a cron job (runs hourly), service (runs the http service) and a storage class (pvc to store data, using gp2).
The issue I am seeing is that gp2 is only readwriteonce.
I notice when the cron job creates a job and it lands on the same node as the service it can mount it fine.
Is there something I can do in the service, deployment or cron job yaml to ensure the cron job and service always land on the same node? It can be any node but as long as cron job goes to the same node as service.
This isn't an issue in my lower environment as we have very little nodes but in our production environments where we have more nodes it is an issue.
In short I want to get my cron job, which creates a job then pod to run the pod on the same node as my services pod is on.
I know thing isn't best practice but our webservice reads data from the pvc and serves it. The cron job pulls new data in from other sources and leaves it for the webserver.
Happy for other ideas / ways.
Thanks
Focusing only on the part:
How can I schedule a workload (Pod, Job, Cronjob) on a specific set of Nodes
You can spawn your Cronjob/Job either with:
nodeSelector
nodeAffinity
nodeSelector
nodeSelector is the simplest recommended form of node selection constraint. nodeSelector is a field of PodSpec. It specifies a map of key-value pairs. For the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). The most common usage is one key-value pair.
-- Kubernetes.io: Docs: Concepts: Scheduling eviction: Assign pod node: Node selector
Example of it could be following (assuming that your node have a specific label that is referenced in .spec.jobTemplate.spec.template.spec.nodeSelector):
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
nodeSelector: # <-- IMPORTANT
schedule: "here" # <-- IMPORTANT
containers:
- name: hello
image: busybox
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
Running above manifest will schedule your Pod (Cronjob) on a node that has a schedule=here label:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-1616323740-mqdmq 0/1 Completed 0 2m33s 10.4.2.67 node-ffb5 <none> <none>
hello-1616323800-wv98r 0/1 Completed 0 93s 10.4.2.68 node-ffb5 <none> <none>
hello-1616323860-66vfj 0/1 Completed 0 32s 10.4.2.69 node-ffb5 <none> <none>
nodeAffinity
Node affinity is conceptually similar to nodeSelector -- it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node.
There are currently two types of node affinity, called requiredDuringSchedulingIgnoredDuringExecution and preferredDuringSchedulingIgnoredDuringExecution. You can think of them as "hard" and "soft" respectively, in the sense that the former specifies rules that must be met for a pod to be scheduled onto a node (just like nodeSelector but using a more expressive syntax), while the latter specifies preferences that the scheduler will try to enforce but will not guarantee.
-- Kubernetes.io: Docs: Concepts: Scheduling eviction: Assign pod node: Node affinity
Example of it could be following (assuming that your node have a specific label that is referenced in .spec.jobTemplate.spec.template.spec.nodeSelector):
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
# --- nodeAffinity part
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: schedule
operator: In
values:
- here
# --- nodeAffinity part
containers:
- name: hello
image: busybox
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
restartPolicy: OnFailure
$ kubectl get pods
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
hello-1616325840-5zkbk 0/1 Completed 0 2m14s 10.4.2.102 node-ffb5 <none> <none>
hello-1616325900-lwndf 0/1 Completed 0 74s 10.4.2.103 node-ffb5 <none> <none>
hello-1616325960-j9kz9 0/1 Completed 0 14s 10.4.2.104 node-ffb5 <none> <none>
Additional resources:
Kubernetes.io: Docs: Concepts: Overview: Working with objects: Labels
I'd reckon you could also take a look on this StackOverflow answer:
Stackoverflow.com: Questions: Kubernetes PVC with readwritemany on AWS
Related
I am deploying Elasticsearch 7.10.1 to AWS EKS Fargate but I got below error when running them:
ERROR: [2] bootstrap checks failed
[1]: max number of threads [1024] for user [elasticsearch] is too low, increase to at least [4096]
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
I found solutions for them is max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536] and Elasticsearch: Max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144].
But both requires a change on the host machine. I am using EKS Fargate which means I don't have access to the Kubernete cluster host machine. What else should I do to solve this issue?
Your best bet is to set these via privileged init containers within your Elasticsearch pod/deployment/statefulset, for example:
apiVersion: v1
kind: Pod
metadata:
name: elasticsearch-node
spec:
initContainers:
- name: increase-vm-max-map
image: busybox
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
- name: increase-fd-ulimit
image: busybox
command: ["sh", "-c", "ulimit -n 65536"]
securityContext:
privileged: true
containers:
- name: elasticsearch-node
...
You could also do this through Daemonsets, although Daemonsets aren't very well suited to one-time tasks (but it's possible to hack around this).
But the init container approach will guarantee that your expected settings are in effect precisely before an Elasticsearch container is launched.
I might have to rebuild the GKE cluster but the compute engine disks won't be delete and needs to be re-used as persistent volumes for the pods. I haven't found a documentation showing how to link the existing GCP compute engine disk as persistent volumes for the pods.
Is it possible to use the existing GCP compute engine disks with GKE storage class and Persistent volumes?
Yes, it's possible to reuse Persistent Disk as Persistent Volume for another clusters, however there is one limitation:
The persistent disk must be in the same zone as the cluster nodes.
If PD will be in a different zone, the cluster will not find this disk.
In Documentation Using preexisting persistent disks as PersistentVolumes you can find information and examples how to reuse persistent disks.
If you didn't create Persistent Disk yet, you can create it based on Creating and attaching a disk documentation. For this tests, I've used below disk:
gcloud compute disks create pd-name \
--size 10G \
--type pd-standard \
--zone europe-west3-b
If you will create PD with less than 200G you will get below Warning, everything depends on your needs. In zone europe-west3-b, pd-standard type can have storage between 10GB - 65536GB.
You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/com
pute/docs/disks#performance.
Keep in mind that you might get different types of Persistent Disk on different zones. For more details you can check Disk Types documentation or run $ gcloud compute disk-types list.
Once you have Persistent Disk you can create PersistentVolume and PersistentVolumeClaim.
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv
spec:
storageClassName: "test"
capacity:
storage: 10G
accessModes:
- ReadWriteOnce
claimRef:
namespace: default
name: pv-claim
gcePersistentDisk:
pdName: pd-name
fsType: ext4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim
spec:
storageClassName: "test"
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10G
---
kind: Pod
apiVersion: v1
metadata:
name: task-pv-pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/data"
name: task-pv-storage
Tests
$ kubectl get pv,pvc,pod
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pv 10G RWO Retain Bound default/pv-claim test 22s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/pv-claim Bound pv 10G RWO test 22s
NAME READY STATUS RESTARTS AGE
pod/task-pv-pod 1/1 Running 0 21s
Write some information to disk
$ kubectl exec -ti task-pv-pod -- bin/bash
root#task-pv-pod:/# cd /usr/share/nginx/html
root#task-pv-pod:/usr/share/nginx/html# echo "This is test message from Nginx pod" >> message.txt
Now I removed all previous resources: pv, pvc and pod.
$ kubectl get pv,pvc,pod
No resources found
Now If I would recreate pv, pvc with small changes in pod, for example busybox.
containers:
- name: busybox
image: busybox
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
volumeMounts:
- mountPath: "/usr/data"
name: task-pv-storage
It will be rebound
$ kubectl get pv,pvc,po
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pv 10G RWO Retain Bound default/pv-claim 43m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/pv-claim Bound pv 10G RWO 43m
NAME READY STATUS RESTARTS AGE
pod/busybox 1/1 Running 0 3m43s
And in the busybox pod I will be able to find Message.txt.
$ kubectl exec -ti busybox -- bin/sh
/ # cd usr
/ # cd usr/data
/usr/data # ls
lost+found message.txt
/usr/data # cat message.txt
This is test message from Nginx pod
As additional information, you won't be able to use it in 2 clusters in the same time, if you would try you will get error:
AttachVolume.Attach failed for volume "pv" : googleapi: Error 400: RESOURCE_IN_USE_B
Y_ANOTHER_RESOURCE - The disk resource 'projects/<myproject>/zones/europe-west3-b/disks/pd-name' is already being used by 'projects/<myproject>/zones/europe-west3-b/instances/gke-cluster-3-default-pool-bb545f05-t5hc'
I'm trying to create AWS ALB-Ingress through EKS following the steps in the document https://docs.aws.amazon.com/eks/latest/userguide/alb-ingress.html
I was successful till the step 7 in creating the controller:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl apply -f v2_0_0_full.yaml
customresourcedefinition.apiextensions.k8s.io/targetgroupbindings.elbv2.k8s.aws created
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook created
Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
serviceaccount/aws-load-balancer-controller configured
role.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-role created
clusterrole.rbac.authorization.k8s.io/aws-load-balancer-controller-role created
rolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-rolebinding created
service/aws-load-balancer-webhook-service created
deployment.apps/aws-load-balancer-controller created
certificate.cert-manager.io/aws-load-balancer-serving-cert created
issuer.cert-manager.io/aws-load-balancer-selfsigned-issuer created
validatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook created
However, the controller does NOT get to "Ready" status:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl get deployment -n kube-system aws-load-balancer-controller
NAME READY UP-TO-DATE AVAILABLE AGE
aws-load-balancer-controller 0/1 1 0 29m
I'm also able to list the pod associated with the controller which also shows NOT READY:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
aws-load-balancer-controller-XXXXXXXXXX-p4l7f 0/1 Pending 0 30m
I also can't seem to get its logs in order to try and debug the issue:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl -n kube-system logs aws-load-balancer-controller-XXXXXXXXXX-p4l7f
[ec2-user#ip-X-X-X-X eks-cluster]$
Furthermore, the /var/log directory also does not have any related logs.
Please help me understand why it is not coming to READY state. Also let me know how to enable logging to debug these kind of issues.
I found the answer here. A faragate deployment requires the region and vpc-id.
helm upgrade -i aws-load-balancer-controller eks/aws-load-balancer-controller \
--set clusterName=<cluster-name> \
--set serviceAccount.create=false \
--set region=<region-code> \
--set vpcId=<vpc-xxxxxxxx>> \
--set serviceAccount.name=aws-load-balancer-controller \
-n kube-system
From the current LB conntroller manifest I found out that LB controller Pod specification doesn't have Readiness probe, only Liveness probe. That means that the Pod becomes Ready as soon as it pass the Liveness probe:
livenessProbe:
failureThreshold: 2
httpGet:
path: /healthz
port: 61779
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 10
But as we can see in the following output, LB controller's Pod is in Pending state:
[ec2-user#ip-X-X-X-X eks-cluster]$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
aws-load-balancer-controller-XXXXXXXXXX-p4l7f 0/1 Pending 0 30m
If Pod stays in Pending state, it means that kube-scheduler is unable to bind the Pod to a cluster node for whatever reason.
Kube-scheduler is a part of Kubernetes control plain that is responsible for assigning Pods to Nodes.
No Pod logs exist at this phase, because Pod's containers are not started yet.
The most convenient way to check the reason is using the kubectl describe command:
kubectl describe pod/podname -n namespacename
On the bottom of the output there are list of events related to the Pod life cycle. Here is an example for the generic Ubuntu Pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37s default-scheduler Successfully assigned default/ubuntu to k8s-w1
Normal Pulling 25s (x2 over 35s) kubelet, k8s-w1 Pulling image "ubuntu"
Normal Pulled 23s (x2 over 30s) kubelet, k8s-w1 Successfully pulled image "ubuntu"
Normal Created 23s (x2 over 30s) kubelet, k8s-w1 Created container ubuntu
Normal Started 23s (x2 over 29s) kubelet, k8s-w1 Started container ubuntu
kubectl get events command can also show the problem. For example:
LAST SEEN TYPE REASON OBJECT MESSAGE
21s Normal Scheduled pod/ubuntu Successfully assigned default/ubuntu to k8s-w1
9s Normal Pulling pod/ubuntu Pulling image "ubuntu"
7s Normal Pulled pod/ubuntu Successfully pulled image "ubuntu"
7s Normal Created pod/ubuntu Created container ubuntu
7s Normal Started pod/ubuntu Started container ubuntu
or there could be a reason why Scheduler can't assign Pod to a Node:
"No nodes are available that match all of the predicates: Insufficient cpu (2), Insufficient memory (2)".
In some cases errors could be found in kube-scheduler Pod logs in kube-system namespace. The logs could be listed using the following command:
kubectl logs $(kubectl get pods -l component=kube-scheduler,tier=control-plane -n kube-system -o name) -n kube-system
Most common reasons why pod isn't scheduled are the following:
lack of CPU or memory resources requested by a Pod on the Nodes.
Pod cannot tolerate Taints on the Nodes
Pod have Affinity/AntiAffinity configuration that prevents it from scheduling
Storage or other specific resource (like GPU) requirements in Pod spec cannot be satisfied
I am a bit very stuck on the step of Launching worker node in the AWS EKS guide. And to be honest, at this point, I don't know what's wrong.
When I do kubectl get svc, I get my cluster so that's good news.
I have this in my aws-auth-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: arn:aws:iam::Account:role/rolename
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
Here is my config in .kube
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: CERTIFICATE
server: server
name: arn:aws:eks:region:account:cluster/clustername
contexts:
- context:
cluster: arn:aws:eks:region:account:cluster/clustername
user: arn:aws:eks:region:account:cluster/clustername
name: arn:aws:eks:region:account:cluster/clustername
current-context: arn:aws:eks:region:account:cluster/clustername
kind: Config
preferences: {}
users:
- name: arn:aws:eks:region:account:cluster/clustername
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- token
- -i
- clustername
command: aws-iam-authenticator.exe
I have launched an EC2 instance with the advised AMI.
Some things to note :
I launched my cluster with the CLI,
I created some Key Pair,
I am not using the Cloudformation Stack,
I attached those policies to the role of my EC2 : AmazonEKS_CNI_Policy, AmazonEC2ContainerRegistryReadOnly, AmazonEKSWorkerNodePolicy.
It is my first attempt at kubernetes and EKS, so please keep that in mind :). Thanks for your help!
Your config file and auth file looks right. Maybe there is some issue with the security group assignments? Can you share the exact steps that you followed to create the cluster and the worker nodes?
And any special reason why you had to use the CLI instead of the console? I mean if it's your first attempt at EKS, then you should probably try to set up a cluster using the console at least once.
Sometimes for whatever reason aws_auth configmap does not apply automatically. So we need to add them manually. I had this issue, so leaving it here in case it helps someone.
Check to see if you have already applied the aws-auth ConfigMap.
kubectl describe configmap -n kube-system aws-auth
If you receive an error stating "Error from server (NotFound): configmaps "aws-auth" not found", then proceed
Download the configuration map.
curl -o aws-auth-cm.yaml https://s3.us-west-2.amazonaws.com/amazon-eks/cloudformation/2020-10-29/aws-auth-cm.yaml
Open the file with your favorite text editor. Replace <ARN of instance role (not instance profile)> with the Amazon Resource Name (ARN) of the IAM role associated with your nodes, and save the file.
Apply the configuration.
kubectl apply -f aws-auth-cm.yaml
Watch the status of your nodes and wait for them to reach the Ready status.
kubectl get nodes --watch
You can also go to your aws console and find the worker node being added.
Find more info here
Trying to setup Vora 2 on an AWS kops k8s cluster.
The pod vsystem-vrep cannot start.
In the logfile on the node I see:
sudo cat vsystem-vrep_30.log
{"log":"2018-03-27 12:54:04.164349|+0000|INFO |Starting Kernel NFS Server||vrep|1|Start|server.go(41)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.164897827Z"}
{"log":"2018-03-27 12:54:04.164405|+0000|INFO |Creating directory /exports||dir-handler|1|makeDir|dir_handler.go(40)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.164919387Z"}
{"log":"2018-03-27 12:54:04.164423|+0000|INFO |Listening for private API on port 8738||vrep|18|func1|server.go(45)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.164923893Z"}
{"log":"2018-03-27 12:54:04.166992|+0000|INFO |Configuring Kernel NFS Server||vrep|1|configure|server.go(126)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.167109138Z"}
{"log":"2018-03-27 12:54:04.219089|+0000|INFO |Configuring Kernel NFS Server||vrep|1|configure|server.go(126)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.219235263Z"}
{"log":"2018-03-27 12:54:04.230256|+0000|FATAL|Error starting NFS server: RPC service for NFS server has not been correctly registered||vrep|1|main|server.go(51)\u001e\n","stream":"stderr","time":"2018-03-27T12:54:04.230526346Z"}
How can I solve this?
When installing Vora 2.1 in AWS with kops, you need to first setup a RWX storage class which is needed by vsystem (the default AWS storage class is read only). During installation, you need to point to that storage class using parameter --vsystem-storage-class. Additionally, parameter --vsystem-load-nfs-modules needs to be set. I suspect that the error happened because that last parameter was missing.
Example, how a call of install.sh would look like:
./install.sh --accept-license --deployment-type=cloud --namespace=xxx
--docker-registry=123456789.dkr.ecr.us-west-1.amazonaws.com
--vora-admin-username=xxx --vora-admin-password=xxx
--cert-domain=my.host.domain.com --interactive-security-configuration=no
--vsystem-storage-class=aws-efs --vsystem-load-nfs-modules
A RWX storage class can e.g. be created as following
Create an EFS file system in same region as kops cluster - see https://us-west-2.console.aws.amazon.com/efs/home?region=us-west-2#/filesystems
Create file system
Select VPC of kops cluster
Add kops master and worker security groups to mount target
Optionally give it a name (e.g. same as your kops cluster, to know what it is used for)
Use default options for the remaining
Once created, note the DNS name (similar to fs-1234e567.efs.us-west-2.amazonaws.com).
Create persistent volume and storage class for Vora
E.g. use yaml files similar to below and point to the newly created EFS file system.
$ cat create_pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: vsystem-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: aws-efs
nfs:
path: /
server: fs-1234e567.efs.us-west-2.amazonaws.com
$ cat create_sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: aws-efs
provisioner: xyz.com/aws-efs
kubectl create -f create_pv.yaml
kubectl create -f create_sc.yaml
-- check if newly created pv and sc exist
kubectl get pv
kubectl get storageclasses