Sharing kubernetes volumes between a cronjob and a deployment - amazon-web-services

Im trying to figure out how to share data between a cronjob and a kubernetes deployment
I'm running Kubernetes hosted on AWS EKS
I've created a persistent volume with a claim and have tried to loop in the claim through both the cronjob and the deployment containers, however after the cronjob runs on the schedule the data still isn't in the other container where it should be
I've seen some threads about using AWS EBS but Im not so sure whats the way to go
Another thread talked about running different schedules to get the persistentvolume
- name: +vars.cust_id+-sophoscentral-logs
persistentVolumeClaim:
claimName: +vars.cust_id+-sophoscentral-logs-pvc
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: +vars.cust_id+-sp-logs-pv
spec:
persistentVolumeReclaimPolicy: Retain
claimRef:
name: +vars.cust_id+-sp-logs-pvc
namespace: +vars.namespace+
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
hostPath:
path: "/var/lib/+vars.cust_id+-sophosdata"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: +vars.cust_id+-sp-logs-pvc
namespace: +vars.namespace+
labels:
component: sp
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
volumeName: +vars.cust_id+-sp-logs-pv

EBS volumes do not support ReadWriteMany as a mode. If you want to stay within the AWS ecosystem, you would need to use EFS which is a hosted NFS product. Other options include self hosted Ceph or Gluster and their related CephFS and GlusterFS tools.
This should generally be avoided if possible. NFS brings a whole host of problems to the table and while CephFS (and probably GlusterFS but I'm less familiar with that one personally) is better it's still a far cry from a "normal" network block device volume. Make sure you understand the limitations this brings with it before you include this in a system design.

Related

AWS EFS Access Point mount

I have a AWS EFS created and I have also created an access point: /ap
I want to mount that AP into the Kubernetes deployment, but it's failing, although when I use / it works.
These are the manifests I am using.
PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 1Mi
mountOptions:
- rsize=1048576
- wsize=1048576
- hard
- timeo=600
- retrans=2
- noresvport
persistentVolumeReclaimPolicy: Retain
nfs:
path: /ap
server: fs-xxx.efs.region.amazonaws.com
claimRef:
name: efs-pvc
namespace: product
PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-pvc
namespace: product
spec:
storageClassName: ""
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Mi
And I receive this upon starting a deployment.
Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data default-token-qwclp]: timed out waiting for the condition
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/97e66236-cb08-4bee-82a7-6f6cf1db9353/volumes/kubernetes.io~nfs/efs-pv --scope -- mount -t nfs -o hard,noresvport,retrans=2,rsize=1048576,timeo=600,wsize=1048576 fs-559a4f0e.efs.eu-central-1.amazonaws.com:/atc /var/lib/kubelet/pods/97e66236-cb08-4bee-82a7-6f6cf1db9353/volumes/kubernetes.io~nfs/efs-pv
Output: Running scope as unit run-4806.scope.
mount.nfs: Connection timed out
Am I missing something? Or should I use CSI driver instead?
i strongly suggest you to follow this steps :
step 1 : deploy efs csi drivers on your nodes
link : https://github.com/kubernetes-sigs/aws-efs-csi-driver
step 2 : make new PV and PVC using these tutorial
link : https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/examples/kubernetes/volume_path/README.md
now , if you want to specify a path for folder then you can follow this tutorial
link : https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/examples/kubernetes/volume_path/specs/example.yaml
it worked for me.

What is use of using VOLUME ID while creating PV?

Observed two kinds of syntaxes for PV & PVC creation in AWS EKS.
1)Using vol Id while creating both PV & PVC (Create volume manually and using that id)
2)Without using vol Id (dynamic provisioning of PV)
example-1:
- apiVersion: "v1"
kind: "PersistentVolume"
metadata:
name: "pv-aws"
spec:
capacity:
storage: 10G
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: gp2
awsElasticBlockStore:
volumeID: vol-xxxxxxxx
fsType: ext4
In this case, I am creating volume manually and using that I'm creating both PV & PVC
example-2:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc1
spec:
accessModes:
- ReadWriteOnce
storageClassName: gp2
resources:
requests:
storage: 20Gi
In this case by just creating PVC its creating volume in the backend (AWS) and PV.
What is the difference and in which to use in which scenarios? Pros and cons?
It should be based on your requirements. Static provisioning is generally not scalable. You have to create the volumes outside of the k8s context. Mounting existing volumes would be useful in disaster recovery scenarios.
Using Storage classes, or dynamic provisioning, is generally preferred because of the convenience. You can create roles and resource quotas to control and limit the storage usage and decrease operational overhead.

Kubernetes Deployment Persistent Volume and VM disk size

After creating a NFS Persistent Volume for one of Deployments running in a cluster the containers are able to store and share the file data between each other. The file data is persistent between the containers life cycles too. And that's great! But I wonder where exactly is this file data stored: where is it "physically" located? Is it saved onto the container itself or is it saved somewhere onto a VM's disk - the VM that is used to run the Deployment?
The VM that is used to host the Deployment has only 20 Gb available disk space by default. Let's say I am running a Docker container inside a pod on a Node (aka VM) running some file server. What happens if I attempt to transfer a 100 Gb file to that File Server? Where will be this gigantic file saved if the VM disk itself has only 20 Gb available space?
Edited later by appending the portion of yaml file:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim
labels:
app: deployment
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
# ---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment
spec:
selector:
matchLabels:
app: app
replicas: 1
minReadySeconds: 10
strategy:
type: RollingUpdate # Recreate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
app: app
spec:
containers:
- name: container
image: 12345.dkr.ecr.us-west-2.amazonaws.com/container:v001
ports:
- containerPort: 80
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volume-mount
mountPath: /data
volumes:
- name: volume-mount
persistentVolumeClaim:
claimName: pv-claim
The "physical" location of the volume is defined by the provisioner, which is defined by the storage class. Your PV claim doesn't have a storage class assigned. That means that the default storage class is used, and it can be anything. I suspect that in EKS default storage class will be EBS, but you should double check that.
First, see what storage class is actually assigned to your persistent volumes:
kubectl get pv -o wide
Then see what provisioner is assigned to that storage class:
kubectl get storageclass
Most likely you will see something like kubernetes.io/aws-ebs. Then google documentation for a specific provisioner to understand where the volume is stored "physically".
In your case the data is stored on NFS share. Connect to NFS server and browse through the shares and find the share that is mounted to the pod.

Deploying Django Application with PostgreSQL to Kubernetes Google Cloud cluster

I am having trouble trying to deploy my Django Application and PostgreSQL database to Kubernetes Google Cloud cluster that I've already configured.
I have successfully created Docker containers for my Django Application and PostgreSQL database. Here is what my docker-compose.yml file looks like:
version: '3'
services:
db:
image: postgres
environment:
- POSTGRES_USER=stefan_radonjic
- POSTGRES_PASSWORD=cepajecar995
- POSTGRES_DB=agent_technologies_db
web:
build: .
command: python manage.py runserver 0.0.0.0:8000 --settings=agents.config.docker-settings
volumes:
- .:/agent-technologies
ports:
- "8000:8000"
links:
- db
depends_on:
- db
I have already build the images, and tried sudo docker-compose up command, and the application works perfectly fine.
After successfully dockerizing Django Application and PostgreSQL, I have tried to configure Deployment / Service YML files required by Kubernetes, but I am having trouble doing so. For example:
deployment-definition.yml - File for deploying Django application:
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-technologies-deployment
labels:
app: agent-technologies
tier: backend
spec:
template:
metadata:
name: agent-technologies-pod
labels:
app: agent-technologies
tier: backend
spec:
containers:
- name:
image:
ports:
- containerPort: 8000
replicas:
selector:
matchLabels:
tier: backend
Inside container list of dictionaries, I know that my container name should be web, but I am not sure where the image of that container is located so I do not know what should i specify as container image.
Another problem lies in postgres/deployment-definition.yml :
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres-container
template:
metadata:
labels:
app: postgres-container
tier: backend
spec:
containers:
- name: postgres-container
image: postgres:9.6.6
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: user
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
- name: POSTGRES_DB
value: agent_technologies_db
ports:
- containerPort: 5432
volumeMounts:
- name: postgres-volume-mount
mountPath: /var/lib/postgresql/data
volumes:
- name: postgres-volume-mount
persistentVolumeClaim:
claimName: postgres-pvc
I do not understand what volumeMounts and volumes are for, and if i even specified them correctly.
Here is my secret-definition.yml file:
apiVersion: v1
kind: Secret
metadata:
name: postgres-credentials
type: Opaque
data:
user: stefan_radonjic
passowrd: cepajecar995
My postgres/service-definition.yml file:
apiVersion: v1
kind: Service
metadata:
name: postgres-service
spec:
selector:
app: postgres-container
ports:
- protocol: TCP
port: 5432
targetPort: 5432
My postgres/volume-definition.yml file:
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-pv
labels:
type: local
spec:
capacity:
storage: 2Gi
storageClassName: standard
accessModes:
- ReadWriteMany
hostPath:
path: /data/postgres-pv
And my postgres/volume-claim-definitono.yml file:
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-pv
labels:
type: local
spec:
capacity:
storage: 2Gi
storageClassName: standard
accessModes:
- ReadWriteMany
hostPath:
path: /data/postgres-pv
Last but not least, my service-definition.yml file - for Django application
apiVersion: v1
kind: Service
metadata:
name: agent-technologies-service
spec:
selector:
app: agent-technologies
ports:
- protocol: TCP
port: 8000
targetPort: 8000
type: NodePort
Besides the questions I have already asked above, I also want to ask am I doing this right? If not, what can I do to fix this.
Inside container list of dictionaries, I know that my container name should be web, but I am not sure where the image of that container is located so I do not know what should i specify as container image.
Name for container is local to the pod (you can have several containers sharing same pod). Container name (web in your case) is for your files given under deployment:
# setting name of first container within pod to web
spec:
containers:
- name: web
Image for container has to be in some available docker container registry. There are multiple options from hosting own docker registry to use publicly available ones. In any case you have to be able to push in your build phase to that docker container registry (be it amazon ECR, Docker, Gitlab, self hosted...) and to pull from that registry from within kubernetes (security settings, pull secrets etc...). In your docker-compose file you use two containers. For db you use public postgres image, and for web you use build command and image is stored to local docker registry on that host only (you have to push it to public repository for k8s to be able to pull from it during deployment).
I do not understand what volumeMounts and volumes are for, and if i even specified them correctly.
In a nutshell, volumes are for attaching volumes to containers. Depending on your use case and decided architecture there are several approaches to volumes, but all in all they boil down to ephemeral, constant and persistent. Ephemeral will be lost on container termination or restart, constant (such as from configMaps) are used for passing configuration files to containers and persistent are most interesting for stateful applications (databases among other things). You can specify volumes in several ways, all volume have to have name (to be referenced by volumeMount) and either direct volume specification or volume claim specification (latter is advised for persistent volume since you can benefit from automatic provisioning that way).
VolumeMounts are for defining where on container file system predefined volume should be mounted. They reference volume to be mounted by name, provide mount point on container filesystem by mountPath and can have subpaths to specific files in some cases.
In your example you tied persistent volume claim obtained volume to data path of postgres (/var/lib/postgresql/data). Althought you use storage class that you didn't specify, interesting part is that your Persistent volume is defined as localpath on host. That means that on each node you have this database pod started you will end up pointing /var/lib/postgresql/data of that pod's db container to /data/postgres-pv on that specific node. This opens up you to following issue: say you have 3 nodes (A, B and C) and your database pod is started on A, uses A's /data/postgres-pv folder as own /var/lib/postrgresql/data. And then you restart it, it gets terminated and rescheduled to node B. All of the sudden, it uses B's /data/postgres-pv local folder (empty) and you end up with empty database. If you use host's local filesystem for persisntence you need to tie such pods with node (or better yet with affinity) selectors. It is advisable for performance reasons to run database volumes of local filesystem, but hose pods lose ability to be rescheduled easily. Another approach is to have some truly persistent volume that can be mounted independently of node (Amazon EBS for example) and they require different PVC (or provisioner to be used).
Besides the questions I have already asked above, I also want to ask am I doing this right? If not, what can I do to fix this.
As stated above, define storage class and either lock db pod to specific node or apply some kind of dynamic provisioning so volume will follow pod's placement on nods.
Oppiniated preference: don't place everything in default namespace, use separate namespace for handling k8s manifests, later on it is much harder to move everything, and harder to accidentally delete wrong thingie...
Also personal preference: database is stateful application and as such it is advised to use statefulset instead of deployment.
There are tools to help you out when you start from docker-compose files and want to convert to kubernetes manifests, worth checking.
Documentation on kubernetes is a bit outdated but quite good and you can have some nice read on volumes and volumeClaims there, there is also active slack channel.
Oh, and mock user/pass when posting files here, we know now about cepa...
Lastly, you are doing great job!

How to make glusterfs survive cluster upgrade

I'm trying to use glusterfs installed directly on my GCE cluster nodes.
The installation does not persist through cluster upgrades, which could be solved with a bootstrap script.
The problem is that when I did reinstall the glusterfs manually and mounted the brick, there was no volumes present, which I had to force recreate.
What happened? Does glusterfs store volume data somewhere else than on bricks? How do I prevent this?
Can I confirm you are doing this on a Kubernetes cluster? I presume you are as you mentioned cluster upgrades.
If so, when you say gluster was installed directly on your nodes, I'm not sure I understand that part of your post. My understanding of the intended use of glusterfs is that it's exists as a distributed file system, and the storage is therefore part of a separate cluster to the Kubernetes nodes.
I believe this is the recommended method to use glusterfs with Kubernetes, and this way the data in the volumes will be retained after the Kubernetes cluster upgrade.
Here are the steps I performed.
I created the glusterfs cluster using the information/script from the first three steps in this this tutorial (specially the 'Clone' 'Bootstrap your Cluster' and 'Create your first volume' steps). In terms of the YAML below, It may be useful to know my glusterfs volume was named 'glustervolume'.
Once I'd confirmed the gluster volume had been created, I created Kubernetes and service and end points that point at that volume. The IP addresses in the the end point section of the YAML below are the internal IP addresses of the instances in the glusterfs storage cluster.
---
apiVersion: v1
kind: Service
metadata:
name: glusterfs-cluster
spec:
ports:
- port: 1
---
apiVersion: v1
kind: Endpoints
metadata:
name: glusterfs-cluster
subsets:
- addresses:
- ip: 10.132.0.6
ports:
- port: 1
- addresses:
- ip: 10.132.0.7
ports:
- port: 1
- addresses:
- ip: 10.132.0.8
ports:
- port: 1
I then created a pod to make use of the gluster volume:
---
apiVersion: v1
kind: Pod
metadata:
name: glusterfs
spec:
containers:
- name: glusterfs
image: nginx
volumeMounts:
- mountPath: "/mnt/glusterfs"
name: glustervolume
volumes:
- name: glustervolume
glusterfs:
endpoints: glusterfs-cluster
path: glustervolume
readOnly: false
As the glusterfs volume exists separately to the Kubernetes cluster (i.e. on it's own cluster), Kubernetes upgrades will not affect the volume.