I want to run a statefulSet in AWS EKS Fargate and attach a EFS volume with it, but I am getting errors in mounting a volume with pod.
These are the error I am getting from describe pod.
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal LoggingEnabled 114s fargate-scheduler Successfully enabled logging for pod
Normal Scheduled 75s fargate-scheduler Successfully assigned default/app1 to fargate-10.0.2.123
Warning FailedMount 43s (x7 over 75s) kubelet MountVolume.SetUp failed for volume "efs-pv" : rpc error: code = Internal desc = Could not mount "fs-xxxxxxxxxxxxxxxxx:/" at "/var/lib/kubelet/pods/b799a6d6-fe9e-4f80-ac2d-8ccf8834d7c4/volumes/kubernetes.io~csi/efs-pv/mount": mount failed: exit status 1
Mounting command: mount
Mounting arguments: -t efs -o tls fs-xxxxxxxxxxxxxxxxx:/ /var/lib/kubelet/pods/b799a6d6-fe9e-4f80-ac2d-8ccf8834d7c4/volumes/kubernetes.io~csi/efs-pv/mount
Output: Failed to resolve "fs-xxxxxxxxxxxxxxxxx.efs.us-east-1.amazonaws.com" - check that your file system ID is correct, and ensure that the VPC has an EFS mount target for this file system ID.
See https://docs.aws.amazon.com/console/efs/mount-dns-name for more detail.
Attempting to lookup mount target ip address using botocore. Failed to import necessary dependency botocore, please install botocore first.
Warning: config file does not have fall_back_to_mount_target_ip_address_enabled item in section mount.. You should be able to find a new config file in the same folder as current config file /etc/amazon/efs/efs-utils.conf. Consider update the new config file to latest config file. Use the default value [fall_back_to_mount_target_ip_address_enabled = True].
If anyone has setup efs volume with eks fargate cluster please have a look at it. I am really stucked in from long time.
What I have setup
Created a EFS Volume
CSIDriver Object
apiVersion: storage.k8s.io/v1beta1
kind: CSIDriver
metadata:
name: efs.csi.aws.com
spec:
attachRequired: false
Storage Class
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: efs-sc
provisioner: efs.csi.aws.com
PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv
spec:
capacity:
storage: 5Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc
csi:
driver: efs.csi.aws.com
volumeHandle: <EFS filesystem ID>
PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-claim
spec:
accessModes:
- ReadWriteMany
storageClassName: efs-sc
resources:
requests:
storage: 5Gi
Pod Configuration
apiVersion: v1
kind: Pod
metadata:
name: app1
spec:
containers:
- name: app1
image: busybox
command: ["/bin/sh"]
args: ["-c", "while true; do echo $(date -u) >> /data/out1.txt; sleep 5; done"]
volumeMounts:
- name: persistent-storage
mountPath: /data
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: efs-claim
I had the same question as you literally a day after and have been working on the error nonstop since then! Did you check to make sure your VPC had DNS hostnames enabled? That is what fixed it for me.
Just an FYI, if you are using fargate and you want to change this--I had to go as far as deleting the entire cluster after changing the DNS hostnames flag in order for the change to propagate. I'm unsure if you're familiar with the DHCP options of a normal ec2 instance, but usually it takes something like renewing the ipconfig in order to force the flag to propagate, but since fargate is a managed system, I was unable to find a way to do so from the node itself. I have created another post here attempting to answer that question.
Another quick FYI: if your pod execution role doesn't have access to EFS, you will need to add a policy that allows access (I just used the default AmazonElasticFileSystemFullAccess Role for the time being in order to try to get things working). Once again, you will have to relaunch your whole cluster in order to get this role change to propagate if you haven't already done so!
Related
I am trying to mount a PV based on an existing GCP persistent disk onto my pod as read-only.
My configs look like this (Parts of them are masked for confidentiality)
apiVersion: v1
kind: PersistentVolume
metadata:
name: data-and-models-pv
namespace: lipsync
spec:
storageClassName: ""
capacity:
storage: 10Gi
accessModes:
- ReadOnlyMany
claimRef:
namespace: lipsync
name: data-and-models-pvc
# csi:
# driver: pd.csi.storage.gke.io
# volumeHandle: projects/***/zones/***/disks/g-lipsync-data-and-models
gcePersistentDisk:
pdName: g-lipsync-data-and-models
fsType: ext4
readOnly: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-and-models-pvc
namespace: lipsync
spec:
storageClassName: ""
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 10Gi
And then, in the pod definition:
volumeMounts:
- mountPath: /app/models
subPath: models
name: data-and-models-v
readOnly: true
- [...]
volumes:
- name: data-and-models-v
persistentVolumeClaim:
claimName: data-and-models-pvc
readOnly: true
However, when I do kubectl apply, the pod never gets created, and I am met with this event:
0s Warning FailedMount pod/lipsync-api-67c784dfb7-4tlln MountVolume.MountDevice failed for volume "data-and-models-pv" : rpc error: code = Internal desc = Failed to format and mount device from ("/dev/disk/by-id/google-g-lipsync-data-and-models") to ("/var/lib/kubelet/plugins/kubernetes.io/csi/pv/data-and-models-pv/globalmount") with fstype ("ext4") and options ([]): mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t ext4 -o defaults /dev/disk/by-id/google-g-lipsync-data-and-models /var/lib/kubelet/plugins/kubernetes.io/csi/pv/data-and-models-pv/globalmount
Output: mount: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/data-and-models-pv/globalmount: cannot mount /dev/sdb read-only.
If I manually ssh into the VM that acts as the node that supports the pod, I can observe that adding noload to the mount options can make me successfully mount the disk:
sudo mount -o ro,noload,defaults /dev/sdb .
But I am not aware of any way to make Kubernetes use this extra mount option.
How can I sucessfully make GKE mount this disk to my pod?
I had the same problem with the readOnly mode today. Hope my experience can give you some idea.
I have two snapshots, one is created by someone else, and the other is created by me. The files on both disks are identical.
But when I mounted the disk provisioned by the screenshot that the other created, the readOnly works fine. But mine just kept getting the same error you have.
So I logged in the pod without readOnly mode to see the difference between the two disks. And the only difference I found is the good one has userId: 1003 and groupId 1004 and mine is root:root. So I ran chown 1003:1004 -R ./ to my disk, used it to create a new snapshot, and then it worked smoothly...
I haven't found the reason yet, but it worked at least. Will let you know if i figure out.
I have a AWS EFS created and I have also created an access point: /ap
I want to mount that AP into the Kubernetes deployment, but it's failing, although when I use / it works.
These are the manifests I am using.
PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 1Mi
mountOptions:
- rsize=1048576
- wsize=1048576
- hard
- timeo=600
- retrans=2
- noresvport
persistentVolumeReclaimPolicy: Retain
nfs:
path: /ap
server: fs-xxx.efs.region.amazonaws.com
claimRef:
name: efs-pvc
namespace: product
PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-pvc
namespace: product
spec:
storageClassName: ""
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Mi
And I receive this upon starting a deployment.
Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data default-token-qwclp]: timed out waiting for the condition
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/97e66236-cb08-4bee-82a7-6f6cf1db9353/volumes/kubernetes.io~nfs/efs-pv --scope -- mount -t nfs -o hard,noresvport,retrans=2,rsize=1048576,timeo=600,wsize=1048576 fs-559a4f0e.efs.eu-central-1.amazonaws.com:/atc /var/lib/kubelet/pods/97e66236-cb08-4bee-82a7-6f6cf1db9353/volumes/kubernetes.io~nfs/efs-pv
Output: Running scope as unit run-4806.scope.
mount.nfs: Connection timed out
Am I missing something? Or should I use CSI driver instead?
i strongly suggest you to follow this steps :
step 1 : deploy efs csi drivers on your nodes
link : https://github.com/kubernetes-sigs/aws-efs-csi-driver
step 2 : make new PV and PVC using these tutorial
link : https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/examples/kubernetes/volume_path/README.md
now , if you want to specify a path for folder then you can follow this tutorial
link : https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/examples/kubernetes/volume_path/specs/example.yaml
it worked for me.
I installed the EFS CI driver and got their Static Provisioning example to work: I was able to start a pod that appended to a file on the EFS volume. I could delete the pod and start another one to inspect that file and confirm the data written by the first pod was still there. But what I actually need to do is mount the volume read-only, and I am having no luck there.
Note that after I successfully ran that example, I launched an EC2 instance and in it, I mounted the EFS filesystem, then added the data that my pods need to access in a read-only fashion. Then I unmounted the EFS filesystem and terminated the instance.
Using the configuration below, which is based on the Static Provisioning example referenced above, my pod does not start Running; it remains in ContainerCreating.
Storage class:
$ kubectl get sc efs-sc -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"efs-sc"},"provisioner":"efs.csi.aws.com"}
creationTimestamp: "2020-01-12T05:36:13Z"
name: efs-sc
resourceVersion: "809880"
selfLink: /apis/storage.k8s.io/v1/storageclasses/efs-sc
uid: 71ecce62-34fd-11ea-8a5f-124f4ee64e8d
provisioner: efs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
Persistent Volume (this is the only PV in the cluster that uses the EFS Storage Class):
$ kubectl get pv efs-pv-ro -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"PersistentVolume","metadata":{"annotations":{},"name":"efs-pv-ro"},"spec":{"accessModes":["ReadOnlyMany"],"capacity":{"storage":"5Gi"},"csi":{"driver":"efs.csi.aws.com","volumeHandle":"fs-26120da7"},"persistentVolumeReclaimPolicy":"Retain","storageClassName":"efs-sc","volumeMode":"Filesystem"}}
pv.kubernetes.io/bound-by-controller: "yes"
creationTimestamp: "2020-01-12T05:36:59Z"
finalizers:
- kubernetes.io/pv-protection
name: efs-pv-ro
resourceVersion: "810231"
selfLink: /api/v1/persistentvolumes/efs-pv-ro
uid: 8d54a80e-34fd-11ea-8a5f-124f4ee64e8d
spec:
accessModes:
- ReadOnlyMany
capacity:
storage: 5Gi
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: efs-claim-ro
namespace: default
resourceVersion: "810229"
uid: e0498cae-34fd-11ea-8a5f-124f4ee64e8d
csi:
driver: efs.csi.aws.com
volumeHandle: fs-26120da7
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc
volumeMode: Filesystem
status:
phase: Bound
Persistent Volume Claim (this is the only PVC in the cluster attempting to use the EFS storage class:
$ kubectl get pvc efs-claim-ro -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"efs-claim-ro","namespace":"default"},"spec":{"accessModes":["ReadOnlyMany"],"resources":{"requests":{"storage":"5Gi"}},"storageClassName":"efs-sc"}}
pv.kubernetes.io/bind-completed: "yes"
pv.kubernetes.io/bound-by-controller: "yes"
creationTimestamp: "2020-01-12T05:39:18Z"
finalizers:
- kubernetes.io/pvc-protection
name: efs-claim-ro
namespace: default
resourceVersion: "810234"
selfLink: /api/v1/namespaces/default/persistentvolumeclaims/efs-claim-ro
uid: e0498cae-34fd-11ea-8a5f-124f4ee64e8d
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 5Gi
storageClassName: efs-sc
volumeMode: Filesystem
volumeName: efs-pv-ro
status:
accessModes:
- ReadOnlyMany
capacity:
storage: 5Gi
phase: Bound
And here is the Pod. It remains in ContainerCreating and does not switch to Running:
$ kubectl get pod efs-app -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"efs-app","namespace":"default"},"spec":{"containers":[{"args":["infinity"],"command":["sleep"],"image":"centos","name":"app","volumeMounts":[{"mountPath":"/data","name":"persistent-storage","subPath":"mmad"}]}],"volumes":[{"name":"persistent-storage","persistentVolumeClaim":{"claimName":"efs-claim-ro"}}]}}
kubernetes.io/psp: eks.privileged
creationTimestamp: "2020-01-12T06:07:08Z"
name: efs-app
namespace: default
resourceVersion: "813420"
selfLink: /api/v1/namespaces/default/pods/efs-app
uid: c3b8421b-3501-11ea-b164-0a9483e894ed
spec:
containers:
- args:
- infinity
command:
- sleep
image: centos
imagePullPolicy: Always
name: app
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: persistent-storage
subPath: mmad
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-z97dh
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: ip-192-168-254-51.ec2.internal
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: efs-claim-ro
- name: default-token-z97dh
secret:
defaultMode: 420
secretName: default-token-z97dh
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-01-12T06:07:08Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2020-01-12T06:07:08Z"
message: 'containers with unready status: [app]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2020-01-12T06:07:08Z"
message: 'containers with unready status: [app]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2020-01-12T06:07:08Z"
status: "True"
type: PodScheduled
containerStatuses:
- image: centos
imageID: ""
lastState: {}
name: app
ready: false
restartCount: 0
state:
waiting:
reason: ContainerCreating
hostIP: 192.168.254.51
phase: Pending
qosClass: BestEffort
startTime: "2020-01-12T06:07:08Z"
I am not sure if subPath will work with this configuration or not, but the same problem happens whether or not subPath is in the Pod configuration.
The problem does seem to be with the volume. If I comment out the volumes and volumeMounts section, the pod runs.
It seems that the PVC has bound with the correct PV, but the pod is not starting.
I'm not seeing a clue in any of the output above, but maybe I'm missing something?
Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:11:03Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.9-eks-c0eccc", GitCommit:"c0eccca51d7500bb03b2f163dd8d534ffeb2f7a2", GitTreeState:"clean", BuildDate:"2019-12-22T23:14:11Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
aws-efs-csi-driver version: v.0.2.0.
Note that one of requirements is to have installed Golang in version 1.13.4+ but you have go1.12.12. So you have to update it. If you are upgrading from an older version of Go you must first remove the existing version.
Take a look here: upgrading-golang.
This driver is supported on Kubernetes version 1.14 and later Amazon EKS clusters and worker nodes. Alpha features of the Amazon EFS CSI Driver are not supported on Amazon EKS clusters.
Cannot mount read-only volume in kubernetes pod (using EFS CSI driver in AWS EKS). Try to change access mode to:
accessModes:
- ReadWriteMany
You can find more information here: efs-csi-driver.
Make sure that while creating EFS filesystem, it is accessible from Kuberenetes cluster. This can be achieved by creating the filesystem inside the same VPC as Kubernetes cluster or using VPC peering.
Static provisioning - EFS filesystem needs to be created manually first, then it could be mounted inside container as a persistent volume (PV) using the driver.
Mount Options - Mount options can be specified in the persistence volume (PV) to define how the volume should be mounted. Aside from normal mount options, you can also specify tls as a mount option to enable encryption in transit of the EFS filesystem.
Because Amazon EFS is an elastic file system, it does not enforce any file system capacity
limits. The actual storage capacity value in persistent volumes and persistent volume claims
is not used when creating the file system. However, since storage capacity is a required field
in Kubernetes, you must specify a valid value, such as, 5Gi in this example. This value does
not limit the size of your Amazon EFS file system
I am running my docker containers with the help of kubernetes cluster on AWS EKS. Two of my docker containers are using shared volume and both of these containers are running inside two different pods. So I want a common volume which can be used by both the pods on aws.
I created an EFS volume and mounted. I am following link to create PersistentVolumeClaim. But I am getting timeout error when efs-provider pod trying to attach mounted EFS volume space. VolumeId, region are correct only.
Detailed Error message for Pod describe:
timeout expired waiting for volumes to attach or mount for pod "default"/"efs-provisioner-55dcf9f58d-r547q". list of unmounted volumes=[pv-volume]. list of unattached volumes=[pv-volume default-token-lccdw]
MountVolume.SetUp failed for volume "pv-volume" : mount failed: exit status 32
AWS EFS uses NFS type volume plugin, and As per
Kubernetes Storage Classes
NFS volume plugin does not come with internal Provisioner like EBS.
So the steps will be:
Create an external Provisioner for NFS volume plugin.
Create a storage class.
Create one volume claim.
Use volume claim in Deployment.
In the configmap section change the file.system.id: and aws.region: to match the details of the EFS you created.
In the deployment section change the server: to the DNS endpoint of the EFS you created.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: efs-provisioner
data:
file.system.id: yourEFSsystemid
aws.region: regionyourEFSisin
provisioner.name: example.com/aws-efs
---
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
name: efs-provisioner
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: efs-provisioner
spec:
containers:
- name: efs-provisioner
image: quay.io/external_storage/efs-provisioner:latest
env:
- name: FILE_SYSTEM_ID
valueFrom:
configMapKeyRef:
name: efs-provisioner
key: file.system.id
- name: AWS_REGION
valueFrom:
configMapKeyRef:
name: efs-provisioner
key: aws.region
- name: PROVISIONER_NAME
valueFrom:
configMapKeyRef:
name: efs-provisioner
key: provisioner.name
volumeMounts:
- name: pv-volume
mountPath: /persistentvolumes
volumes:
- name: pv-volume
nfs:
server: yourEFSsystemID.efs.yourEFSregion.amazonaws.com
path: /
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: aws-efs
provisioner: example.com/aws-efs
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: efs
annotations:
volume.beta.kubernetes.io/storage-class: "aws-efs"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Mi
For more explanation and details go to https://github.com/kubernetes-incubator/external-storage/tree/master/aws/efs
The problem for me was that I was specifying a different path in my PV than /. And the directory on the NFS server that was referenced beyond that path did not yet exist. I had to manually create that directory first.
The issue was, I had 2 ec2 instances running, but I mounted EFS volume to only one of the ec2 instances and kubectl was always deploying pods on the ec2 instance which doesn't have the mounted volume. Now I mounted the same volume to both the instances and using PVC, PV like below. It is working fine.
ec2 mounting: AWS EFS mounting with EC2
PV.yml
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs
spec:
capacity:
storage: 100Mi
accessModes:
- ReadWriteMany
nfs:
server: efs_public_dns.amazonaws.com
path: "/"
PVC.yml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: efs
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
replicaset.yml
----- only volume section -----
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: efs
I am trying to deploy a stateful set mounted on a Persistent Volume.
I installed Kubernetes on AWS via kops.
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T12:22:21Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"9", GitVersion:"v1.9.3", GitCommit:"d2835416544f298c919e2ead3be3d0864b52323b", GitTreeState:"clean", BuildDate:"2018-02-07T11:55:20Z", GoVersion:"go1.9.2", Compiler:"gc", Platform:"linux/amd64"}
According to this issue I need to create the PVC first:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: zk-data-claim
spec:
storageClassName: default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: zk-logs-claim
spec:
storageClassName: default
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
The default storage class exists, and the PVC binds a PV successfully:
$ kubectl get sc
NAME PROVISIONER AGE
default kubernetes.io/aws-ebs 20d
gp2 (default) kubernetes.io/aws-ebs 20d
ssd (default) kubernetes.io/aws-ebs 20d
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
zk-data-claim Bound pvc-5584fdf7-3853-11e8-a73b-02bb35448afe 2Gi RWO default 11m
zk-logs-claim Bound pvc-5593e249-3853-11e8-a73b-02bb35448afe 2Gi RWO default 11m
I can see these two volumes in the EC2 EBS Volumes list as "available" at first, but then later becomes "in-use".
And then ingest it in my StatefulSet
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: zk
spec:
serviceName: zk-cluster
replicas: 3
template:
metadata:
labels:
app: zookeeper
spec:
volumes:
- name: zk-data
persistentVolumeClaim:
claimName: zk-data-claim
- name: zk-logs
persistentVolumeClaim:
claimName: zk-logs-claim
containers:
....
volumeMounts:
- name: zk-data
mountPath: /opt/zookeeper/data
- name: zk-logs
mountPath: /opt/zookeeper/logs
Which fails with
Unable to mount volumes for pod "zk-0_default(83b8dc93-3850-11e8-a73b-02bb35448afe)": timeout expired waiting for volumes to attach/mount for pod "default"/"zk-0". list of unattached/unmounted volumes=[zk-data zk-logs]
I'm working in the default namespace.
Any ideas what could be causing this failure?
The problem was that my cluster was made with C5 nodes. C5 and M5 nodes follow a different naming convention (NVMe), and the naming is not recognised.
Recreate the cluster with t2 type nodes.
Yeah, that is a very, very well known problem with AWS and kubernetes. Most often it is caused by a stale directory on another Node causing the EBS volume to still be "in use" from the other Node's perspective, and thus the Linux machine will not turn loose of the device when requested by the AWS API. You will see plenty of chatter about that in both kubelet.service journals, on the machine with the EBS and the machine that wants the EBS.
It has been my experience that only ssh-ing into the Node where the EBS volume is currently attached, finding the mount(s), unmounting them, and then waiting for the exponential back-off timer to expire will solve that :-(
The hand-waving version is:
## cleaning up stale docker containers might not be a terrible idea
docker rm $(docker ps -aq -f status=exited)
## identify any (and there could very well be multiple) mounts
## of the EBS device-name
mount | awk '/dev\/xvdf/ {print $2}' | xargs umount
## or sometimes kubernetes will actually name the on-disk directory ebs~ so:
mount | awk '/ebs~something/{print $2}' | xargs umount
You may experience some success involving lsof in that process, too, but hopefully(!) cleaning up the exited containers will remove the need for such a thing.