I am having trouble trying to deploy my Django Application and PostgreSQL database to Kubernetes Google Cloud cluster that I've already configured.
I have successfully created Docker containers for my Django Application and PostgreSQL database. Here is what my docker-compose.yml file looks like:
version: '3'
services:
db:
image: postgres
environment:
- POSTGRES_USER=stefan_radonjic
- POSTGRES_PASSWORD=cepajecar995
- POSTGRES_DB=agent_technologies_db
web:
build: .
command: python manage.py runserver 0.0.0.0:8000 --settings=agents.config.docker-settings
volumes:
- .:/agent-technologies
ports:
- "8000:8000"
links:
- db
depends_on:
- db
I have already build the images, and tried sudo docker-compose up command, and the application works perfectly fine.
After successfully dockerizing Django Application and PostgreSQL, I have tried to configure Deployment / Service YML files required by Kubernetes, but I am having trouble doing so. For example:
deployment-definition.yml - File for deploying Django application:
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-technologies-deployment
labels:
app: agent-technologies
tier: backend
spec:
template:
metadata:
name: agent-technologies-pod
labels:
app: agent-technologies
tier: backend
spec:
containers:
- name:
image:
ports:
- containerPort: 8000
replicas:
selector:
matchLabels:
tier: backend
Inside container list of dictionaries, I know that my container name should be web, but I am not sure where the image of that container is located so I do not know what should i specify as container image.
Another problem lies in postgres/deployment-definition.yml :
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres-container
template:
metadata:
labels:
app: postgres-container
tier: backend
spec:
containers:
- name: postgres-container
image: postgres:9.6.6
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: user
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
- name: POSTGRES_DB
value: agent_technologies_db
ports:
- containerPort: 5432
volumeMounts:
- name: postgres-volume-mount
mountPath: /var/lib/postgresql/data
volumes:
- name: postgres-volume-mount
persistentVolumeClaim:
claimName: postgres-pvc
I do not understand what volumeMounts and volumes are for, and if i even specified them correctly.
Here is my secret-definition.yml file:
apiVersion: v1
kind: Secret
metadata:
name: postgres-credentials
type: Opaque
data:
user: stefan_radonjic
passowrd: cepajecar995
My postgres/service-definition.yml file:
apiVersion: v1
kind: Service
metadata:
name: postgres-service
spec:
selector:
app: postgres-container
ports:
- protocol: TCP
port: 5432
targetPort: 5432
My postgres/volume-definition.yml file:
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-pv
labels:
type: local
spec:
capacity:
storage: 2Gi
storageClassName: standard
accessModes:
- ReadWriteMany
hostPath:
path: /data/postgres-pv
And my postgres/volume-claim-definitono.yml file:
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-pv
labels:
type: local
spec:
capacity:
storage: 2Gi
storageClassName: standard
accessModes:
- ReadWriteMany
hostPath:
path: /data/postgres-pv
Last but not least, my service-definition.yml file - for Django application
apiVersion: v1
kind: Service
metadata:
name: agent-technologies-service
spec:
selector:
app: agent-technologies
ports:
- protocol: TCP
port: 8000
targetPort: 8000
type: NodePort
Besides the questions I have already asked above, I also want to ask am I doing this right? If not, what can I do to fix this.
Inside container list of dictionaries, I know that my container name should be web, but I am not sure where the image of that container is located so I do not know what should i specify as container image.
Name for container is local to the pod (you can have several containers sharing same pod). Container name (web in your case) is for your files given under deployment:
# setting name of first container within pod to web
spec:
containers:
- name: web
Image for container has to be in some available docker container registry. There are multiple options from hosting own docker registry to use publicly available ones. In any case you have to be able to push in your build phase to that docker container registry (be it amazon ECR, Docker, Gitlab, self hosted...) and to pull from that registry from within kubernetes (security settings, pull secrets etc...). In your docker-compose file you use two containers. For db you use public postgres image, and for web you use build command and image is stored to local docker registry on that host only (you have to push it to public repository for k8s to be able to pull from it during deployment).
I do not understand what volumeMounts and volumes are for, and if i even specified them correctly.
In a nutshell, volumes are for attaching volumes to containers. Depending on your use case and decided architecture there are several approaches to volumes, but all in all they boil down to ephemeral, constant and persistent. Ephemeral will be lost on container termination or restart, constant (such as from configMaps) are used for passing configuration files to containers and persistent are most interesting for stateful applications (databases among other things). You can specify volumes in several ways, all volume have to have name (to be referenced by volumeMount) and either direct volume specification or volume claim specification (latter is advised for persistent volume since you can benefit from automatic provisioning that way).
VolumeMounts are for defining where on container file system predefined volume should be mounted. They reference volume to be mounted by name, provide mount point on container filesystem by mountPath and can have subpaths to specific files in some cases.
In your example you tied persistent volume claim obtained volume to data path of postgres (/var/lib/postgresql/data). Althought you use storage class that you didn't specify, interesting part is that your Persistent volume is defined as localpath on host. That means that on each node you have this database pod started you will end up pointing /var/lib/postgresql/data of that pod's db container to /data/postgres-pv on that specific node. This opens up you to following issue: say you have 3 nodes (A, B and C) and your database pod is started on A, uses A's /data/postgres-pv folder as own /var/lib/postrgresql/data. And then you restart it, it gets terminated and rescheduled to node B. All of the sudden, it uses B's /data/postgres-pv local folder (empty) and you end up with empty database. If you use host's local filesystem for persisntence you need to tie such pods with node (or better yet with affinity) selectors. It is advisable for performance reasons to run database volumes of local filesystem, but hose pods lose ability to be rescheduled easily. Another approach is to have some truly persistent volume that can be mounted independently of node (Amazon EBS for example) and they require different PVC (or provisioner to be used).
Besides the questions I have already asked above, I also want to ask am I doing this right? If not, what can I do to fix this.
As stated above, define storage class and either lock db pod to specific node or apply some kind of dynamic provisioning so volume will follow pod's placement on nods.
Oppiniated preference: don't place everything in default namespace, use separate namespace for handling k8s manifests, later on it is much harder to move everything, and harder to accidentally delete wrong thingie...
Also personal preference: database is stateful application and as such it is advised to use statefulset instead of deployment.
There are tools to help you out when you start from docker-compose files and want to convert to kubernetes manifests, worth checking.
Documentation on kubernetes is a bit outdated but quite good and you can have some nice read on volumes and volumeClaims there, there is also active slack channel.
Oh, and mock user/pass when posting files here, we know now about cepa...
Lastly, you are doing great job!
Related
I am trying to find a solution to make use of the same Amazon EFS for mounting multiple directories in the Kubernetes deployment. Here is my use case
I have an application named app1 that needs to persist a directory named "/opt/templates" to EFS
I have another application named app2 that needs to persist a directory named "/var/logs" to EFS
We deploy the applications as a Kubernetes Pod in the Amazon EKS cluster. If i am using the same EFS for both the above mounts, i can see all the files from both the directories "/opt/templates" and "/var/logs" as i am using the same EFS.
How can i solve the problem of using same EFS for both the application without seeing app1 mounted files in app2 directory ? Is it even possible of using the same EFS ID for multiple applications ?
Here is the Kubernetes manifests i used for for one of the application which includes PersistentVolume, PVC and the Deployment
----
apiVersion: v1
kind: PersistentVolume
metadata:
name: efs-pv-1
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: efs-sc-report
csi:
driver: efs.csi.aws.com
volumeHandle: fs-XXXXX
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: efs-pvc-1
spec:
accessModes:
- ReadWriteMany
storageClassName: efs-sc
resources:
requests:
storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy1
spec:
replicas: 1
selector:
matchLabels:
app: deploy1
template:
metadata:
labels:
app: deploy1
spec:
containers:
- name: app1
image: imageXXXX
ports:
- containerPort: 6455
volumeMounts:
- name: temp-data
mountPath: /opt/templates/
volumes:
- name: shared-data
emptyDir: {}
- name: temp-data
persistentVolumeClaim:
claimName: efs-pvc-1
It looks like you can do that by including the path as part of the volume handle.
A sub directory of EFS can be mounted inside container. This gives cluster operator the flexibility to restrict the amount of data being accessed from different containers on EFS.
For example:
volumeHandle: [FileSystemId]:[Path]
I think you will need to create two separate PVs and PVCs, one for /opt/templates, and the other for /var/logs, each pointing to a different path on your EFS.
After creating a NFS Persistent Volume for one of Deployments running in a cluster the containers are able to store and share the file data between each other. The file data is persistent between the containers life cycles too. And that's great! But I wonder where exactly is this file data stored: where is it "physically" located? Is it saved onto the container itself or is it saved somewhere onto a VM's disk - the VM that is used to run the Deployment?
The VM that is used to host the Deployment has only 20 Gb available disk space by default. Let's say I am running a Docker container inside a pod on a Node (aka VM) running some file server. What happens if I attempt to transfer a 100 Gb file to that File Server? Where will be this gigantic file saved if the VM disk itself has only 20 Gb available space?
Edited later by appending the portion of yaml file:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim
labels:
app: deployment
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
# ---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment
spec:
selector:
matchLabels:
app: app
replicas: 1
minReadySeconds: 10
strategy:
type: RollingUpdate # Recreate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
app: app
spec:
containers:
- name: container
image: 12345.dkr.ecr.us-west-2.amazonaws.com/container:v001
ports:
- containerPort: 80
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volume-mount
mountPath: /data
volumes:
- name: volume-mount
persistentVolumeClaim:
claimName: pv-claim
The "physical" location of the volume is defined by the provisioner, which is defined by the storage class. Your PV claim doesn't have a storage class assigned. That means that the default storage class is used, and it can be anything. I suspect that in EKS default storage class will be EBS, but you should double check that.
First, see what storage class is actually assigned to your persistent volumes:
kubectl get pv -o wide
Then see what provisioner is assigned to that storage class:
kubectl get storageclass
Most likely you will see something like kubernetes.io/aws-ebs. Then google documentation for a specific provisioner to understand where the volume is stored "physically".
In your case the data is stored on NFS share. Connect to NFS server and browse through the shares and find the share that is mounted to the pod.
I am confused when it comes to deploying PostgreSQL database of my Django application with Kubernetes. Here is how I have constructed my deployment-definition.yml file:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres-container
template:
metadata:
labels:
app: postgres-container
tier: backend
spec:
containers:
- name: postgres-container
image: postgres:9.6.6
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: user
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
- name: POSTGRES_DB
value: agent_technologies_db
ports:
- containerPort: 5432
volumeMounts:
- name: postgres-volume-mount
mountPath: /var/lib/postgresql/data
volumes:
- name: postgres-volume-mount
persistentVolumeClaim:
claimName: postgres-pvc
- name: postgres-credentials
secret:
secretName: postgres-credentials
What I dont understand is this. If I specify (like i did) an existing image of PostgreSQL inside spec of Kubernetes Deployment object, how do I actually run my application? What do I need to specify as HOST inside my settings.py file?
Here is what my settings.py file looks like for now:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'agent_technologies_db',
'USER': 'stefan_radonjic',
'PASSWORD': 'cepajecar995',
'HOST': 'localhost',
'PORT': '',
}
}
It is constructed this way because I am still designing the application and I do not wanna deploy it to Kubernetes cluster just yet. But when I do, what am I suppose to specify for : HOST and PORT ? And also, is this the right way to deploy PostgreSQL to Kubernetes Cluster.
Thank you in advance!
*** QUESTION UPDATE ****
As suggested, I have created service.yml:
apiVersion: v1
kind: Service
metadata:
name: postgres-service
spec:
selector:
app: postgres-container
tier: backend
ports:
- protocol: TCP
port: 5432
targetPort: 5432
type: ClusterIP
And I have updated my settings.py file:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'agent_technologies_db',
'USER': 'stefan_radonjic',
'PASSWORD': 'cepajecar995',
'HOST': 'postgres-service',
'PORT': 5432,
}
}
But I am getting the following error:
In order to allow communication to your PostreSQL deployment in Kubernetes, you need to set up a Service object. If your Django app will live in the same cluster as your PostgreSQL deployment, then you will want a ClusterIP type service; otherwise, if your Django app lives outside of your cluster, you will want a LoadBalancer or NodePort type service.
There are two ways to create a service:
YAML
The first is through a yaml file, which in your case would look like this:
kind: Service
apiVersion: v1
metadata:
name: postgres
spec:
selector:
app: postgres-container
tier: backend
ports:
- name: postgres
protocol: TCP
port: 5432
targetPort: 5432
The .spec.selector field defines the target of the Service. This service will target pods with labels app=postgres-container and tier=backend. It exposes port 5432 of the container. In your Django configuration, you would put the name of the service as the HOST: in this case, the name is simply postgres. Kubernetes will resolve the service name to the matching pod IP and route traffic to the pod. The port will be the port of the service: 5432.
kubectl expose
The other way of creating a service is through the kubectl expose command:
kubectl expose deployment/postgres
This command will default to a ClusterIP type service and expose the ports defined in the .spec.containers.ports fields in the Deployment yaml.
More info:
https://kubernetes.io/docs/concepts/services-networking/service/
And also, is this the right way to deploy PostgreSQL to Kubernetes Cluster.
This depends on a few variables. Do you plan on deploying a Postgres cluster? If so, you may want to look into using a StatefulSet:
StatefulSets are valuable for applications that require one or more of
the following.
Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, graceful deletion and termination.
Ordered, automated rolling updates.
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#using-statefulsets
Do you have someone knowledgable of Postgres that's going to configure and maintain it? If not, I would also recommend that you look into deploying a managed Postgres server outside of a cluster (e.g. RDS). You can still deploy your Django app within the cluster and connect to your DB via an ExternalName service.
The reason I recommend this is that managing stateful applications in a Kubernetes cluster can be challenging. I'm not familiar with Postgres, but here's a cautionary tale of running Postgres on Kubernetes: https://gravitational.com/blog/running-postgresql-on-kubernetes/
In addition to that, here are a few experiences I've run into that has influenced my decision to remove stateful workloads from my cluster:
Stuck volumes
If you're using AWS EBS volumes, volumes can get "stuck" on a node and fail to detach and reattach to a new node if your DB pod gets rescheduled to a new node.
Migrating to a new cluster
If you ever need to move your workloads to a new cluster, you will have to deal with the added challenge of moving your state to the new cluster as well without suffering any data loss. If you move your stateful apps outside of the cluster then you can treat the whole cluster as cattle, and then tearing it down and migrating to a new cluster becomes a whole lot easier.
More info:
K8s blog post on deploying Postgres with StatefulSets: https://kubernetes.io/blog/2017/02/postgresql-clusters-kubernetes-statefulsets/
You have 2 cases.
1) Your application runs inside the kubernetes cluster.
You need to reference your postgres pod through a service.
apiVersion: v1
kind: Service
metadata:
labels:
app: postgres-container
tier: backend
name: postgres
spec:
ports:
- port: 5432
protocol: TCP
selector:
app: postgres
sessionAffinity: None
type: ClusterIP
Then write postgres when you need to specify your postgres_host.
2) Your application runs outside the kubernetes cluster.
In this case you have to provide a way to enter inside the cluster from outside. Or through a LoadBalancer, or through a Ingress.
In this case too you have to create a Service (see point 1).
I write an example with ingress.
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress-tutorial
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: my_kube.info
http:
paths:
- path: /
backend:
serviceName: postgres-container
servicePort: 5432
my_kube.info (or whatever you choose as name) must be resolvable (DNS or write a line in /etc/hosts).
If you need a HA postgres manager, you may take a look at: http://stolon.io/
I'm trying to use glusterfs installed directly on my GCE cluster nodes.
The installation does not persist through cluster upgrades, which could be solved with a bootstrap script.
The problem is that when I did reinstall the glusterfs manually and mounted the brick, there was no volumes present, which I had to force recreate.
What happened? Does glusterfs store volume data somewhere else than on bricks? How do I prevent this?
Can I confirm you are doing this on a Kubernetes cluster? I presume you are as you mentioned cluster upgrades.
If so, when you say gluster was installed directly on your nodes, I'm not sure I understand that part of your post. My understanding of the intended use of glusterfs is that it's exists as a distributed file system, and the storage is therefore part of a separate cluster to the Kubernetes nodes.
I believe this is the recommended method to use glusterfs with Kubernetes, and this way the data in the volumes will be retained after the Kubernetes cluster upgrade.
Here are the steps I performed.
I created the glusterfs cluster using the information/script from the first three steps in this this tutorial (specially the 'Clone' 'Bootstrap your Cluster' and 'Create your first volume' steps). In terms of the YAML below, It may be useful to know my glusterfs volume was named 'glustervolume'.
Once I'd confirmed the gluster volume had been created, I created Kubernetes and service and end points that point at that volume. The IP addresses in the the end point section of the YAML below are the internal IP addresses of the instances in the glusterfs storage cluster.
---
apiVersion: v1
kind: Service
metadata:
name: glusterfs-cluster
spec:
ports:
- port: 1
---
apiVersion: v1
kind: Endpoints
metadata:
name: glusterfs-cluster
subsets:
- addresses:
- ip: 10.132.0.6
ports:
- port: 1
- addresses:
- ip: 10.132.0.7
ports:
- port: 1
- addresses:
- ip: 10.132.0.8
ports:
- port: 1
I then created a pod to make use of the gluster volume:
---
apiVersion: v1
kind: Pod
metadata:
name: glusterfs
spec:
containers:
- name: glusterfs
image: nginx
volumeMounts:
- mountPath: "/mnt/glusterfs"
name: glustervolume
volumes:
- name: glustervolume
glusterfs:
endpoints: glusterfs-cluster
path: glustervolume
readOnly: false
As the glusterfs volume exists separately to the Kubernetes cluster (i.e. on it's own cluster), Kubernetes upgrades will not affect the volume.
I have a Kubernetes cluster running on Google Cloud Platform. I have 3 nodes and several pods running on these nodes.
One of the pods runs Ghost blog platform and has mounted a gcePersistentDisk volume. The manifest file to create the pod:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
name: ghost
name: ghost
spec:
replicas: 1
template:
metadata:
labels:
name: ghost
spec:
containers:
- image: ghost:0.7
name: ghost
env:
- name: NODE_ENV
value: production
ports:
- containerPort: 2368
name: http-server
volumeMounts:
- name: ghost
mountPath: /var/lib/ghost
volumes:
- name: ghost
gcePersistentDisk:
pdName: ghost
fsType: ext4
I'd like someway to access this volume from my development machine. Is there any way to mount this disk in my machine?
If your development machine is not part of the GCE cluster (i.e. a GCE VM), then you will not be able to directly mount it. Your best bet in that case would be to SSH to it via a machine it is mounted it (i.e the node your pod is scheduled to).