Kubernetes Deployment Persistent Volume and VM disk size - amazon-web-services

After creating a NFS Persistent Volume for one of Deployments running in a cluster the containers are able to store and share the file data between each other. The file data is persistent between the containers life cycles too. And that's great! But I wonder where exactly is this file data stored: where is it "physically" located? Is it saved onto the container itself or is it saved somewhere onto a VM's disk - the VM that is used to run the Deployment?
The VM that is used to host the Deployment has only 20 Gb available disk space by default. Let's say I am running a Docker container inside a pod on a Node (aka VM) running some file server. What happens if I attempt to transfer a 100 Gb file to that File Server? Where will be this gigantic file saved if the VM disk itself has only 20 Gb available space?
Edited later by appending the portion of yaml file:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim
labels:
app: deployment
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
# ---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment
spec:
selector:
matchLabels:
app: app
replicas: 1
minReadySeconds: 10
strategy:
type: RollingUpdate # Recreate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
app: app
spec:
containers:
- name: container
image: 12345.dkr.ecr.us-west-2.amazonaws.com/container:v001
ports:
- containerPort: 80
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volume-mount
mountPath: /data
volumes:
- name: volume-mount
persistentVolumeClaim:
claimName: pv-claim

The "physical" location of the volume is defined by the provisioner, which is defined by the storage class. Your PV claim doesn't have a storage class assigned. That means that the default storage class is used, and it can be anything. I suspect that in EKS default storage class will be EBS, but you should double check that.
First, see what storage class is actually assigned to your persistent volumes:
kubectl get pv -o wide
Then see what provisioner is assigned to that storage class:
kubectl get storageclass
Most likely you will see something like kubernetes.io/aws-ebs. Then google documentation for a specific provisioner to understand where the volume is stored "physically".

In your case the data is stored on NFS share. Connect to NFS server and browse through the shares and find the share that is mounted to the pod.

Related

How to use Same EFS for mounting multiple directories in Kubernetes deployment

I am trying to find a solution to make use of the same Amazon EFS for mounting multiple directories in the Kubernetes deployment. Here is my use case
I have an application named app1 that needs to persist a directory named "/opt/templates" to EFS
I have another application named app2 that needs to persist a directory named "/var/logs" to EFS
We deploy the applications as a Kubernetes Pod in the Amazon EKS cluster. If i am using the same EFS for both the above mounts, i can see all the files from both the directories "/opt/templates" and "/var/logs" as i am using the same EFS.
How can i solve the problem of using same EFS for both the application without seeing app1 mounted files in app2 directory ? Is it even possible of using the same EFS ID for multiple applications ?
Here is the Kubernetes manifests i used for for one of the application which includes PersistentVolume, PVC and the Deployment
----
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv-1
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc-report
csi:
driver: efs.csi.aws.com
volumeHandle: fs-XXXXX
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-pvc-1
spec:
accessModes:
- ReadWriteMany
storageClassName: efs-sc
resources:
requests:
storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy1
spec:
replicas: 1
selector:
matchLabels:
app: deploy1
template:
metadata:
labels:
app: deploy1
spec:
containers:
- name: app1
image: imageXXXX
ports:
- containerPort: 6455
volumeMounts:
- name: temp-data
mountPath: /opt/templates/
volumes:
- name: shared-data
emptyDir: {}
- name: temp-data
persistentVolumeClaim:
claimName: efs-pvc-1
It looks like you can do that by including the path as part of the volume handle.
A sub directory of EFS can be mounted inside container. This gives cluster operator the flexibility to restrict the amount of data being accessed from different containers on EFS.
For example:
volumeHandle: [FileSystemId]:[Path]
I think you will need to create two separate PVs and PVCs, one for /opt/templates, and the other for /var/logs, each pointing to a different path on your EFS.

Kubernetes on GKE can't mount volumes

There are two nodes and 2 pods running in my cluster
(1 pod on each node)
My persistent volume claim is below
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: blockchain-data
annotations: {
"volume.beta.kubernetes.io/storage-class": "blockchain-disk"
}
spec:
accessModes:
- ReadWriteOnce
storageClassName: ssd
resources:
requests:
storage: 500Gi
and mystorageclass
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: blockchain-disk
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
and I mounted it on my container like this
spec:
containers:
- image: gcr.io/indiesquare-dev/geth-node:v1.8.12
imagePullPolicy: IfNotPresent
name: geth-node
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- name: blockchain-data
mountPath: /root/.ethereum
volumes:
- name: blockchain-data
persistentVolumeClaim:
claimName: blockchain-data
I have replicas set to 2. When start the deployment, the first pod starts correctly with the disk properly mounted.
However, the second pod gets stuck at containtercreating
If I run kubectl describe pods
Warning FailedAttachVolume 18m attachdetach-controller Multi-Attach error for volume "pvc-c56fbb79-954f-11e8-870b-4201ac100003" Volume is already exclusively attached to one node and can't be attached to another
I think according to this message, I am trying to attach the disk which is already attached to another node.
What I want to do is to have two persistent volumes separately attached to two pods. If the pods scale up, then each should have a different volume attached.
How can I do this?
You can't attach a GCE Persistent Disk to multiple nodes. So if your pods are landing on different nodes you can't reuse the same disk.
You need something like ReadOnlyMany access mode but you have ReadWriteOnce.
Read https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes#access_modes

Deploying Django Application with PostgreSQL to Kubernetes Google Cloud cluster

I am having trouble trying to deploy my Django Application and PostgreSQL database to Kubernetes Google Cloud cluster that I've already configured.
I have successfully created Docker containers for my Django Application and PostgreSQL database. Here is what my docker-compose.yml file looks like:
version: '3'
services:
db:
image: postgres
environment:
- POSTGRES_USER=stefan_radonjic
- POSTGRES_PASSWORD=cepajecar995
- POSTGRES_DB=agent_technologies_db
web:
build: .
command: python manage.py runserver 0.0.0.0:8000 --settings=agents.config.docker-settings
volumes:
- .:/agent-technologies
ports:
- "8000:8000"
links:
- db
depends_on:
- db
I have already build the images, and tried sudo docker-compose up command, and the application works perfectly fine.
After successfully dockerizing Django Application and PostgreSQL, I have tried to configure Deployment / Service YML files required by Kubernetes, but I am having trouble doing so. For example:
deployment-definition.yml - File for deploying Django application:
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-technologies-deployment
labels:
app: agent-technologies
tier: backend
spec:
template:
metadata:
name: agent-technologies-pod
labels:
app: agent-technologies
tier: backend
spec:
containers:
- name:
image:
ports:
- containerPort: 8000
replicas:
selector:
matchLabels:
tier: backend
Inside container list of dictionaries, I know that my container name should be web, but I am not sure where the image of that container is located so I do not know what should i specify as container image.
Another problem lies in postgres/deployment-definition.yml :
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres-container
template:
metadata:
labels:
app: postgres-container
tier: backend
spec:
containers:
- name: postgres-container
image: postgres:9.6.6
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: user
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
- name: POSTGRES_DB
value: agent_technologies_db
ports:
- containerPort: 5432
volumeMounts:
- name: postgres-volume-mount
mountPath: /var/lib/postgresql/data
volumes:
- name: postgres-volume-mount
persistentVolumeClaim:
claimName: postgres-pvc
I do not understand what volumeMounts and volumes are for, and if i even specified them correctly.
Here is my secret-definition.yml file:
apiVersion: v1
kind: Secret
metadata:
name: postgres-credentials
type: Opaque
data:
user: stefan_radonjic
passowrd: cepajecar995
My postgres/service-definition.yml file:
apiVersion: v1
kind: Service
metadata:
name: postgres-service
spec:
selector:
app: postgres-container
ports:
- protocol: TCP
port: 5432
targetPort: 5432
My postgres/volume-definition.yml file:
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-pv
labels:
type: local
spec:
capacity:
storage: 2Gi
storageClassName: standard
accessModes:
- ReadWriteMany
hostPath:
path: /data/postgres-pv
And my postgres/volume-claim-definitono.yml file:
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-pv
labels:
type: local
spec:
capacity:
storage: 2Gi
storageClassName: standard
accessModes:
- ReadWriteMany
hostPath:
path: /data/postgres-pv
Last but not least, my service-definition.yml file - for Django application
apiVersion: v1
kind: Service
metadata:
name: agent-technologies-service
spec:
selector:
app: agent-technologies
ports:
- protocol: TCP
port: 8000
targetPort: 8000
type: NodePort
Besides the questions I have already asked above, I also want to ask am I doing this right? If not, what can I do to fix this.
Inside container list of dictionaries, I know that my container name should be web, but I am not sure where the image of that container is located so I do not know what should i specify as container image.
Name for container is local to the pod (you can have several containers sharing same pod). Container name (web in your case) is for your files given under deployment:
# setting name of first container within pod to web
spec:
containers:
- name: web
Image for container has to be in some available docker container registry. There are multiple options from hosting own docker registry to use publicly available ones. In any case you have to be able to push in your build phase to that docker container registry (be it amazon ECR, Docker, Gitlab, self hosted...) and to pull from that registry from within kubernetes (security settings, pull secrets etc...). In your docker-compose file you use two containers. For db you use public postgres image, and for web you use build command and image is stored to local docker registry on that host only (you have to push it to public repository for k8s to be able to pull from it during deployment).
I do not understand what volumeMounts and volumes are for, and if i even specified them correctly.
In a nutshell, volumes are for attaching volumes to containers. Depending on your use case and decided architecture there are several approaches to volumes, but all in all they boil down to ephemeral, constant and persistent. Ephemeral will be lost on container termination or restart, constant (such as from configMaps) are used for passing configuration files to containers and persistent are most interesting for stateful applications (databases among other things). You can specify volumes in several ways, all volume have to have name (to be referenced by volumeMount) and either direct volume specification or volume claim specification (latter is advised for persistent volume since you can benefit from automatic provisioning that way).
VolumeMounts are for defining where on container file system predefined volume should be mounted. They reference volume to be mounted by name, provide mount point on container filesystem by mountPath and can have subpaths to specific files in some cases.
In your example you tied persistent volume claim obtained volume to data path of postgres (/var/lib/postgresql/data). Althought you use storage class that you didn't specify, interesting part is that your Persistent volume is defined as localpath on host. That means that on each node you have this database pod started you will end up pointing /var/lib/postgresql/data of that pod's db container to /data/postgres-pv on that specific node. This opens up you to following issue: say you have 3 nodes (A, B and C) and your database pod is started on A, uses A's /data/postgres-pv folder as own /var/lib/postrgresql/data. And then you restart it, it gets terminated and rescheduled to node B. All of the sudden, it uses B's /data/postgres-pv local folder (empty) and you end up with empty database. If you use host's local filesystem for persisntence you need to tie such pods with node (or better yet with affinity) selectors. It is advisable for performance reasons to run database volumes of local filesystem, but hose pods lose ability to be rescheduled easily. Another approach is to have some truly persistent volume that can be mounted independently of node (Amazon EBS for example) and they require different PVC (or provisioner to be used).
Besides the questions I have already asked above, I also want to ask am I doing this right? If not, what can I do to fix this.
As stated above, define storage class and either lock db pod to specific node or apply some kind of dynamic provisioning so volume will follow pod's placement on nods.
Oppiniated preference: don't place everything in default namespace, use separate namespace for handling k8s manifests, later on it is much harder to move everything, and harder to accidentally delete wrong thingie...
Also personal preference: database is stateful application and as such it is advised to use statefulset instead of deployment.
There are tools to help you out when you start from docker-compose files and want to convert to kubernetes manifests, worth checking.
Documentation on kubernetes is a bit outdated but quite good and you can have some nice read on volumes and volumeClaims there, there is also active slack channel.
Oh, and mock user/pass when posting files here, we know now about cepa...
Lastly, you are doing great job!

Mount kubernetes' volume in development machine

I have a Kubernetes cluster running on Google Cloud Platform. I have 3 nodes and several pods running on these nodes.
One of the pods runs Ghost blog platform and has mounted a gcePersistentDisk volume. The manifest file to create the pod:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
name: ghost
name: ghost
spec:
replicas: 1
template:
metadata:
labels:
name: ghost
spec:
containers:
- image: ghost:0.7
name: ghost
env:
- name: NODE_ENV
value: production
ports:
- containerPort: 2368
name: http-server
volumeMounts:
- name: ghost
mountPath: /var/lib/ghost
volumes:
- name: ghost
gcePersistentDisk:
pdName: ghost
fsType: ext4
I'd like someway to access this volume from my development machine. Is there any way to mount this disk in my machine?
If your development machine is not part of the GCE cluster (i.e. a GCE VM), then you will not be able to directly mount it. Your best bet in that case would be to SSH to it via a machine it is mounted it (i.e the node your pod is scheduled to).

Kubernetes: How can I set the number of replicas more than 1 using awsElasticBlockStore?

I'm trying to create a Cassandra cluster in Kubernetes. I want to use awsElasticBlockStore to make the data persistent. As a result, I've written a YAML file like following for the corresponding Replication Controller:
apiVersion: v1
kind: ReplicationController
metadata:
name: cassandra-rc
spec:
# Question: How can I do this?
replicas: 2
selector:
name: cassandra
template:
metadata:
labels:
name: cassandra
spec:
containers:
- resources:
limits :
cpu: 1.0
image: cassandra:2.2.6
name: cassandra
ports:
- containerPort: 7000
name: comm
- containerPort: 9042
name: cql
- containerPort: 9160
name: thrift
volumeMounts:
- name: cassandra-persistent-storage
mountPath: /cassandra_data
volumes:
- name: cassandra-persistent-storage
awsElasticBlockStore:
volumeID: aws://ap-northeast-1c/vol-xxxxxxxx
fsType: ext4
However, only one pod can be properly launched with this configuration.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
cassandra-rc-xxxxx 0/1 ContainerCreating 0 5m
cassandra-rc-yyyyy 1/1 Running 0 5m
When I run $ kubectl describe pod cassandra-rc-xxxxx, I see an error like following:
Error syncing pod, skipping: Could not attach EBS Disk "aws://ap-northeast-1c/vol-xxxxxxxx": Error attaching EBS volume: VolumeInUse: vol-xxxxxxxx is already attached to an instance
It's understandable because an ELB Volume can be mounted from only one node. So only one pod can successfully mount the volume and bootup, while others just fail.
Is there any good solution for this? Do I need to create multiple Replication Controllers for each pod?
You are correct, one EBS volume can only be mounted on a single EC2 at a given time. To solve you have the following options:
Use multiple EBS volumes with multiple Replication Controllers
Use a distributed file system (e.g. Gluster) and avoid EBS issue
Follow along with PetSet (https://github.com/kubernetes/kubernetes/issues/260)