how to run celery with django on openshift 3 - django

What is the easiest way to launch a celery beat and worker process in my django pod?
I'm migrating my Openshift v2 Django app to Openshift v3. I'm using Pro subscription. I'm really a noob on Openshift v3 and docker and containers and kubernetes. I have used this tutorial https://blog.openshift.com/migrating-django-applications-openshift-3/ to migrate my app (which works pretty well).
I'm now struggling on how to start celery. On Openshift 2 I just used an action hook post_start:
source $OPENSHIFT_HOMEDIR/python/virtenv/bin/activate
python $OPENSHIFT_REPO_DIR/wsgi/podpub/manage.py celery worker\
--pidfile="$OPENSHIFT_DATA_DIR/celery/run/%n.pid"\
--logfile="$OPENSHIFT_DATA_DIR/celery/log/%n.log"\
python $OPENSHIFT_REPO_DIR/wsgi/podpub/manage.py celery beat\
--pidfile="$OPENSHIFT_DATA_DIR/celery/run/celeryd.pid"\
--logfile="$OPENSHIFT_DATA_DIR/celery/log/celeryd.log" &
-c 1\
--autoreload &
It is a quite simple setup. It just uses the django database as a message broker. No rabbitMQ or something.
Would a openshift "job" be appropriated for that? Or better use powershift image (https://pypi.python.org/pypi/powershift-image) action commands? But I did not understand how to execute them.
here is the current deployment configuration for my only app "
apiVersion: v1
kind: DeploymentConfig
metadata:
annotations:
openshift.io/generated-by: OpenShiftNewApp
creationTimestamp: 2017-12-27T22:58:31Z
generation: 67
labels:
app: django
name: django
namespace: myproject
resourceVersion: "68466321"
selfLink: /oapi/v1/namespaces/myproject/deploymentconfigs/django
uid: 64600436-ab49-11e7-ab43-0601fd434256
spec:
replicas: 1
selector:
app: django
deploymentconfig: django
strategy:
activeDeadlineSeconds: 21600
recreateParams:
timeoutSeconds: 600
resources: {}
rollingParams:
intervalSeconds: 1
maxSurge: 25%
maxUnavailable: 25%
timeoutSeconds: 600
updatePeriodSeconds: 1
type: Recreate
template:
metadata:
annotations:
openshift.io/generated-by: OpenShiftNewApp
creationTimestamp: null
labels:
app: django
deploymentconfig: django
spec:
containers:
- image: docker-registry.default.svc:5000/myproject/django#sha256:6a0caac773acc65daad2e6ac87695f9f01ae3c99faba14536e0ec2b65088c808
imagePullPolicy: Always
name: django
ports:
- containerPort: 8080
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/app-root/src/data
name: data
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: data
persistentVolumeClaim:
claimName: django-data
test: false
triggers:
- type: ConfigChange
- imageChangeParams:
automatic: true
containerNames:
- django
from:
kind: ImageStreamTag
name: django:latest
namespace: myproject
lastTriggeredImage: docker-registry.default.svc:5000/myproject/django#sha256:6a0caac773acc65daad2e6ac87695f9f01ae3c99faba14536e0ec2b65088c808
type: ImageChange
I'm using mod_wsgi-express and this is my app.sh
ARGS="$ARGS --log-to-terminal"
ARGS="$ARGS --port 8080"
ARGS="$ARGS --url-alias /static wsgi/static"
exec mod_wsgi-express start-server $ARGS wsgi/application
Help is very appreciated. Thank you

I have managed to get it working, though I'm not quite happy with it. I will move to a postgreSQL database very soon. Here is what I did:
wsgi_mod-express has an option called service-script which starts an additional process besides the actual app. So I updated my app.sh:
#!/bin/bash
ARGS=""
ARGS="$ARGS --log-to-terminal"
ARGS="$ARGS --port 8080"
ARGS="$ARGS --url-alias /static wsgi/static"
ARGS="$ARGS --service-script celery_starter scripts/startCelery.py"
exec mod_wsgi-express start-server $ARGS wsgi/application
mind the last ARGS=... line.
I created a python script that starts up my celery worker and beat.
startCelery.py:
import subprocess
OPENSHIFT_REPO_DIR="/opt/app-root/src"
OPENSHIFT_DATA_DIR="/opt/app-root/src/data"
pathToManagePy=OPENSHIFT_REPO_DIR + "/wsgi/podpub"
worker_cmd = [
"python",
pathToManagePy + "/manage.py",
"celery",
"worker",
"--pidfile="+OPENSHIFT_REPO_DIR+"/%n.pid",
"--logfile="+OPENSHIFT_DATA_DIR+"/celery/log/%n.log",
"-c 1",
"--autoreload"
]
print(worker_cmd)
subprocess.Popen(worker_cmd, close_fds=True)
beat_cmd = [
"python",
pathToManagePy + "/manage.py",
"celery",
"beat",
"--pidfile="+OPENSHIFT_REPO_DIR+"/celeryd.pid",
"--logfile="+OPENSHIFT_DATA_DIR+"/celery/log/celeryd.log",
]
print(beat_cmd)
subprocess.Popen(beat_cmd)
this was actually working, but I kept receiving a message when I tried to launch the celery worker saying
"Running a worker with superuser privileges when the worker accepts messages serialized with pickle is a very bad idea!
If you really want to continue then you have to set the C_FORCE_ROOT environment variable (but please think about this before you do)."
Eventhough I added these configurations to my settings.py in order to remove pickle serializer, it kept giving me that same error message.
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACEEPT_CONTENT = ['json']
I don't know why.
At the end I added C_FORCE_ROOT to my .s2i/enviroment
C_FORCE_ROOT=true
Now it's working, at least I thinks so. My next job will only run in some hours. I'm still open for any further suggestions and tipps.

Related

Redis deployed in AWS - Connection time out from localhost SpringBoot app

Small question regarding Redis deployed in AWS (not AWS Elastic Cache) and an issue connecting to it.
Here is the setup of the Redis deployed in AWS: (pasting only the Kubernetes StatefulSet and Service)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
initContainers:
- name: config
image: redis:7.0.5-alpine
command: [ "sh", "-c" ]
args:
- |
cp /tmp/redis/redis.conf /etc/redis/redis.conf
echo "finding master..."
MASTER_FDQN=`hostname -f | sed -e 's/redis-[0-9]\./redis-0./'`
if [ "$(redis-cli -h sentinel -p 5000 ping)" != "PONG" ]; then
echo "master not found, defaulting to redis-0"
if [ "$(hostname)" = "redis-0" ]; then
echo "this is redis-0, not updating config..."
else
echo "updating redis.conf..."
echo "slaveof $MASTER_FDQN 6379" >> /etc/redis/redis.conf
fi
else
echo "sentinel found, finding master"
MASTER="$(redis-cli -h sentinel -p 5000 sentinel get-master-addr-by-name mymaster | grep -E '(^redis-\d{1,})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})')"
echo "master found : $MASTER, updating redis.conf"
echo "slaveof $MASTER 6379" >> /etc/redis/redis.conf
fi
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: config
mountPath: /tmp/redis/
containers:
- name: redis
image: redis:7.0.5-alpine
command: ["redis-server"]
args: ["/etc/redis/redis.conf"]
ports:
- containerPort: 6379
name: redis
volumeMounts:
- name: data
mountPath: /data
- name: redis-config
mountPath: /etc/redis/
volumes:
- name: redis-config
emptyDir: {}
- name: config
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: nfs-1
resources:
requests:
storage: 50Mi
---
apiVersion: v1
kind: Service
metadata:
name: redis
spec:
ports:
- port: 6379
targetPort: 6379
name: redis
selector:
app: redis
type: LoadBalancer
The pods are healthy, I can exec into it and perform operations fine. Here is the get all:
NAME READY STATUS RESTARTS AGE
pod/redis-0 1/1 Running 0 22h
pod/redis-1 1/1 Running 0 22h
pod/redis-2 1/1 Running 0 22h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/redis LoadBalancer 192.168.45.55 10.51.5.2 6379:30315/TCP 26h
NAME READY AGE
statefulset.apps/redis 3/3 22h
Here is the describe of the service:
Name: redis
Namespace: Namespace
Labels: <none>
Annotations: <none>
Selector: app=redis
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 192.168.22.33
IPs: 192.168.22.33
LoadBalancer Ingress: 10.51.5.2
Port: redis 6379/TCP
TargetPort: 6379/TCP
NodePort: redis 30315/TCP
Endpoints: 192.xxx:6379,192.xxx:6379,192.xxx:6379
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IPAllocated 68s metallb-controller Assigned IP ["10.51.5.2"]
Normal nodeAssigned 58s (x5 over 66s) metallb-speaker announcing from node "someaddress.com" with protocol "bgp"
Normal nodeAssigned 58s (x5 over 66s) metallb-speaker announcing from node "someaddress.com" with protocol "bgp"
I then try to connect to it, i.e. inserting some data with a very straightforward Spring Boot application. The application has no business logic, just trying to insert data.
Here are the relevant parts:
#Configuration
public class RedisConfiguration {
#Bean
public ReactiveRedisConnectionFactory reactiveRedisConnectionFactory() {
return new LettuceConnectionFactory("10.51.5.2", 30315);
}
#Repository
public class RedisRepository {
private final ReactiveRedisOperations<String, String> reactiveRedisOperations;
public RedisRepository(ReactiveRedisOperations<String, String> reactiveRedisOperations) {
this.reactiveRedisOperations = reactiveRedisOperations;
}
public Mono<RedisPojo> save(RedisPojo redisPojo) {
return reactiveRedisOperations.opsForValue().set(redisPojo.getInput(), redisPojo.getOutput()).map(__ -> redisPojo);
}
Each time I am trying to write the data, I am getting this exception:
2022-12-02T20:20:08.015+08:00 ERROR 1184 --- [ctor-http-nio-3] a.w.r.e.AbstractErrorWebExceptionHandler : [8f16a752-1] 500 Server Error for HTTP POST "/save"
org.springframework.data.redis.RedisConnectionFailureException: Unable to connect to Redis
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.translateException(LettuceConnectionFactory.java:1602) ~[spring-data-redis-3.0.0.jar:3.0.0]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ Handler com.redis.controller.RedisController#test(RedisRequest) [DispatcherHandler]
*__checkpoint ⇢ HTTP POST "/save" [ExceptionHandlingWebHandler]
Original Stack Trace:
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.translateException(LettuceConnectionFactory.java:1602) ~[spring-data-redis-3.0.0.jar:3.0.0]
Caused by: io.lettuce.core.RedisConnectionException: Unable to connect to 10.51.5.2/<unresolved>:30315
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:78) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:56) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.AbstractRedisClient.getConnection(AbstractRedisClient.java:350) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.RedisClient.connect(RedisClient.java:216) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /10.51.5.2:30315
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.85.Final.jar:4.1.85.Final]
This is particularly puzzling, because I am quite sure the code of the Spring Boot app is working. When I change the IP of return new LettuceConnectionFactory("10.51.5.2", 30315);: to
a regular Redis on my laptop ("localhost", 6379),
a dockerized Redis on my laptop,
a dockerized Redis on prem, all are working fine.
Therefore, I am quite puzzled what did I do wrong with the setup of this Redis in AWS.
What should I do in order to connect to it properly.
May I get some help please?
Thank you
By default, Redis binds itself to the IP addresses 127.0.0.1 and ::1 and does not accept connections against non-local interfaces. Chances are high that this is your main issue and you may want to review your redis.conf file to bind Redis to the interface you need or to the generic * -::*, as explained in the comments of the config file itself (which I have linked above).
With that being said, Redis also does not accept connections on non-local interfaces if the default user has no password - a security layer named Protected mode. Thus you should either give your default user a password or disable protected mode in your redis.conf file.
Not sure if this applies to your case but, as a side note, I would suggest to always avoid exposing Redis to the Internet.
You are mixing 2 things.
To enable this service for pods in different namespaces you do not need external load balancer, you can just try to use redis.namespace-name:6379 dns name and it will just work. Such dns is there for every service you create (but works only inside kubernetes)
Kubernetes will make sure that your traffic will be routed to proper pods (assuming there is more than one).
If you want to expose redis from outside of kubernetes then you need to make sure there is connectivity from the outside and then you need network load balancer that will forward traffic to your kubernetes service (in your case node port, so you need NLB with eks worker nodes: 30315 as a targets)
If your worker nodes have public IP and their SecurityGroups allow connecting to them directly, you could try to connect to worker node's IP directly just to test things out (without LB).
And regardless off yout setup you can always create proxy via kubectl
kubectl port-forward -n redisNS svc/redis 6379:6379
and connect from spring boot app to localhost:6379
How do you want to connect from app to redis in a final setup?

use AWS Secrets & Configuration Provider for EKS: Error from server (BadRequest)

I'm following this AWS documentation which explains how to properly configure AWS Secrets Manager to let it works with EKS through Kubernetes Secrets.
I successfully followed step by step all the different commands as explained in the documentation.
The only difference I get is related to this step where I have to run:
kubectl get po --namespace=kube-system
The expected output should be:
csi-secrets-store-qp9r8 3/3 Running 0 4m
csi-secrets-store-zrjt2 3/3 Running 0 4m
but instead I get:
csi-secrets-store-provider-aws-lxxcz 1/1 Running 0 5d17h
csi-secrets-store-provider-aws-rhnc6 1/1 Running 0 5d17h
csi-secrets-store-secrets-store-csi-driver-ml6jf 3/3 Running 0 5d18h
csi-secrets-store-secrets-store-csi-driver-r5cbk 3/3 Running 0 5d18h
As you can see the names are different, but I'm quite sure it's ok :-)
The real problem starts here in step 4: I created the following YAML file (as you ca see I added some parameters):
apiVersion: secrets-store.csi.x-k8s.io/v1alpha1
kind: SecretProviderClass
metadata:
name: aws-secrets
spec:
provider: aws
parameters:
objects: |
- objectName: "mysecret"
objectType: "secretsmanager"
And finally I created a deploy (as explain here in step 5) using the following yaml file:
# test-deployment.yaml
kind: Pod
apiVersion: v1
metadata:
name: nginx-secrets-store-inline
spec:
serviceAccountName: iamserviceaccountforkeyvaultsecretmanagerresearch
containers:
- image: nginx
name: nginx
volumeMounts:
- name: mysecret-volume
mountPath: "/mnt/secrets-store"
readOnly: true
volumes:
- name: mysecret-volume
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "aws-secrets"
After the deployment through the command:
kubectl apply -f test-deployment.yaml -n mynamespace
The pod is not able to start properly because the following error is generated:
Error from server (BadRequest): container "nginx" in pod "nginx-secrets-store-inline" is waiting to start: ContainerCreating
But, for example, if I run the deployment with the following yaml the POD will be successfully created
# test-deployment.yaml
kind: Pod
apiVersion: v1
metadata:
name: nginx-secrets-store-inline
spec:
serviceAccountName: iamserviceaccountforkeyvaultsecretmanagerresearch
containers:
- image: nginx
name: nginx
volumeMounts:
- name: keyvault-credential-volume
mountPath: "/mnt/secrets-store"
readOnly: true
volumes:
- name: keyvault-credential-volume
emptyDir: {} # <<== !! LOOK HERE !!
as you can see I used
emptyDir: {}
So as far I can see the problem here is related to the following YAML lines:
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "aws-secrets"
To be honest it's even not clear in my mind what's happing here.
Probably I didn't properly enabled the volume permission in EKS?
Sorry but I'm a newbie in both AWS and Kubernetes configurations.
Thanks for you time
--- NEW INFO ---
If I run
kubectl describe pod nginx-secrets-store-inline -n mynamespace
where nginx-secrets-store-inline is the name of the pod, I get the following output:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30s default-scheduler Successfully assigned mynamespace/nginx-secrets-store-inline to ip-10-0-24-252.eu-central-1.compute.internal
Warning FailedMount 14s (x6 over 29s) kubelet MountVolume.SetUp failed for volume "keyvault-credential-volume" : rpc error: code = Unknown desc = failed to get secretproviderclass mynamespace/aws-secrets, error: SecretProviderClass.secrets-store.csi.x-k8s.io "aws-secrets" not found
Any hints?
Finally I realized why it wasn't working. As explained here, the error:
Warning FailedMount 3s (x4 over 6s) kubelet, kind-control-plane MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to get secretproviderclass default/azure, error: secretproviderclasses.secrets-store.csi.x-k8s.io "azure" not found
is related to namespace:
The SecretProviderClass being referenced in the volumeMount needs to exist in the same namespace as the application pod.
So both the yaml file should be deployed in the same namespace (adding, for example, the -n mynamespace argument).
Finally I got it working!

helm single template for multiple similar deployments

Apologies if this is a really simple question
I have a 2 applications that can potentially share the same template
applications:
#
app1:
containerName: app1
replicaCount: 10
logLevel: warn
queue: queue1
#
app2:
containerName: app2
replicaCount: 20
logLevel: info
queue: queue2
...
If I create a single template for both apps, is there a wildcard or variable i can use
that will iterate over both of the apps (i.e. app1 or app2) ? ...e.g. the bit where ive put <SOMETHING_HERE> below ...
spec:
env:
- name: LOG_LEVEL
value: "{{ .Values.applications.<SOMETHING_HERE>.logLevel }}"
Currently (which im sure is not very efficient) I have two seperate templates that each define their own app .e.g
app1_template.yaml
{{ .Values.applications.app1.logLevel }}
app2_template.yaml
{{ .Values.applications.app2.logLevel }}
Which im pretty sure is not the way Im supposed to do it?
any help on this would be greatly appreciated
One of the solutions would be to have one template and multiple value files, one per deployment/environment
spec:
env:
- name: LOG_LEVEL
value: "{{ .Values.logLevel }}"
values-app1.yaml:
containerName: app1
replicaCount: 10
logLevel: warn
queue: queue1
values-app2.yaml:
containerName: app2
replicaCount: 20
logLevel: info
queue: queue2
then, specify which values file should be used, by adding this to helm command:
APP=app1 # or app2
helm upgrade --install . --values ./values-${APP}.yaml
you can also have shared values, let say in regular values.yaml and provide multiple files:
APP=app1
helm upgrade --install . --values ./values.yaml --values ./values-${APP}.yaml
You can just use a single values file as you've done, and then set the app name when you run helm..
helm upgrade --install app1 ./charts --set app=app1
and
helm upgrade --install app2 ./charts --set app=app2
Then in your templates use:
spec:
env:
- name: LOG_LEVEL
value: "{{ .Values.applications (.Values.app) "loglevel" }}"

Nginx upstream server values when serving an API using docker-compose and kubernetes

I'm trying to use docker-compose and kubernetes as two different solutions to setup a Django API served by Gunicorn (as the web server) and Nginx (as the reverse proxy). Here are the key files:
default.tmpl (nginx) - this is converted to default.conf when the environment variable is filled in:
upstream api {
server ${UPSTREAM_SERVER};
}
server {
listen 80;
location / {
proxy_pass http://api;
}
location /staticfiles {
alias /app/static/;
}
}
docker-compose.yaml:
version: '3'
services:
api-gunicorn:
build: ./api
command: gunicorn --bind=0.0.0.0:8000 api.wsgi:application
volumes:
- ./api:/app
api-proxy:
build: ./api-proxy
command: /bin/bash -c "envsubst < /etc/nginx/conf.d/default.tmpl > /etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"
environment:
- UPSTREAM_SERVER=api-gunicorn:8000
ports:
- 80:80
volumes:
- ./api/static:/app/static
depends_on:
- api-gunicorn
api-deployment.yaml (kubernetes):
apiVersion: apps/v1
kind: Deployment
metadata:
name: release-name-myapp-api-proxy
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: myapp-api-proxy
template:
metadata:
labels:
app.kubernetes.io/name: myapp-api-proxy
spec:
containers:
- name: myapp-api-gunicorn
image: "helm-django_api-gunicorn:latest"
imagePullPolicy: Never
command:
- "/bin/bash"
args:
- "-c"
- "gunicorn --bind=0.0.0.0:8000 api.wsgi:application"
- name: myapp-api-proxy
image: "helm-django_api-proxy:latest"
imagePullPolicy: Never
command:
- "/bin/bash"
args:
- "-c"
- "envsubst < /etc/nginx/conf.d/default.tmpl > /etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"
env:
- name: UPSTREAM_SERVER
value: 127.0.0.1:8000
volumeMounts:
- mountPath: /app/static
name: api-static-assets-on-host-mount
volumes:
- name: api-static-assets-on-host-mount
hostPath:
path: /Users/jonathan.metz/repos/personal/code-demos/kubernetes-demo/helm-django/api/static
My question involves the UPSTREAM_SERVER environment variable.
For docker-compose.yaml, the following values have worked for me:
Setting it to the name of the gunicorn service and the port it's running on (in this case api-gunicorn:8000). This is the best way to do it (and how I've done it in the docker-compose file above) because I don't need to expose the 8000 port to the host machine.
Setting it to MY_IP_ADDRESS:8000 as described in this SO post. This method requires me to expose the 8000 port, which is not ideal.
For api-deployment.yaml, only the following value has worked for me:
Setting it to localhost:8000. Inside of a pod, all containers can communicate using localhost.
Are there any other values for UPSTREAM_SERVER that work here, especially in the kubernetes file? I feel like I should be able to point to the container's name and that should work.
You could create a service to target container myapp-api-gunicorn but this will expose it outside of the pod:
apiVersion: v1
kind: Service
metadata:
name: api-gunicorn-service
spec:
selector:
app.kubernetes.io/name: myapp-api-proxy
ports:
- protocol: TCP
port: 8000
targetPort: 8000
You might also use hostname and subdomain fields inside a pod to take advantage of FQDN.
Currently when a pod is created, its hostname is the Pod’s metadata.name value.
The Pod spec has an optional hostname field, which can be used to specify the Pod’s hostname. When specified, it takes precedence over the Pod’s name to be the hostname of the pod. For example, given a Pod with hostname set to “my-host”, the Pod will have its hostname set to “my-host”.
The Pod spec also has an optional subdomain field which can be used to specify its subdomain. For example, a Pod with hostname set to “foo”, and subdomain set to “bar”, in namespace “my-namespace”, will have the fully qualified domain name (FQDN) “foo.bar.my-namespace.svc.cluster-domain.example”.
Also here is a nice article from Mirantis which talks about exposing multiple containers in a pod

CrashLoopBackOff Error when deploying Django app on GKE (Kubernetes)

Folks,
What problem now still persists:
I have now gone beyond the code getting stuck on CrashLoopBackOff by fixing the Dockerfile run command as suggested by Emil Gi, however the external IP is not forwarding to my pod library app server
Status
Fixed port to 8080 in Dockerfile and ensured it is consistent across
Made sure Dockerfile has proper commands so that it doesn't terminate immediately post startup, this was what was causing the CrashLoop Back
Problem is still that the load balancer external IP I click on gives this error "This site can’t be reached34.93.141.11 refused to connect."
Original Question:
How do I resolve this CrashLoopBackOff? I looked at many docs and tried debugging but unsure what is causing this? The app runs perfectly in local mode, it even deploys smoothly into appengine standard, but GKE nope. Any pointers to debug this further most appreciated.
Problem: The cloudsql proxy container is running, but the library-app container is having CrashLoopBackOff error. The pod was assigned to a node, starts pulling the images, starting the images, and then it goes into this BackOff state.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
library-7699b84747-9skst 1/2 CrashLoopBackOff 28 121m
$ kubectl logs library-7699b84747-9skst
Error from server (BadRequest): a container name must be specified for pod library-7699b84747-9skst, choose one of: [library-app cloudsql-proxy]
​$ kubectl describe pods library-7699b84747-9skst
Name: library-7699b84747-9skst
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-library-default-pool-35b5943a-ps5v/10.160.0.13
Start Time: Fri, 06 Dec 2019 09:34:11 +0530
Labels: app=library
pod-template-hash=7699b84747
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container library-app; cpu request for container cloudsql-proxy
Status: Running
IP: 10.16.0.10
Controlled By: ReplicaSet/library-7699b84747
Containers:
library-app:
Container ID: docker://e7d8aac3dff318de34f750c3f1856cd754aa96a7203772de748b3e397441a609
Image: gcr.io/library-259506/library
Image ID: docker-pullable://gcr.io/library-259506/library#sha256:07f54e055621ab6ddcbb49666984501cf98c95133bcf7405ca076322fb0e4108
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 06 Dec 2019 09:35:07 +0530
Finished: Fri, 06 Dec 2019 09:35:07 +0530
Ready: False
Restart Count: 2
Requests:
cpu: 100m
Environment:
DATABASE_USER: <set to the key 'username' in secret 'cloudsql'> Optional: false
DATABASE_PASSWORD: <set to the key 'password' in secret 'cloudsql'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
cloudsql-proxy:
Container ID: docker://352284231e7f02011dd1ab6999bf9a283b334590435278442e9a04d4d0684405
Image: gcr.io/cloudsql-docker/gce-proxy:1.16
Image ID: docker-pullable://gcr.io/cloudsql-docker/gce-proxy#sha256:7d302c849bebee8a3fc90a2705c02409c44c91c813991d6e8072f092769645cf
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
--dir=/cloudsql
-instances=library-259506:asia-south1:library=tcp:3306
-credential_file=/secrets/cloudsql/credentials.json
State: Running
Started: Fri, 06 Dec 2019 09:34:51 +0530
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/cloudsql from cloudsql (rw)
/etc/ssl/certs from ssl-certs (rw)
/secrets/cloudsql from cloudsql-oauth-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cloudsql-oauth-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-oauth-credentials
Optional: false
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
HostPathType:
cloudsql:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-kj497:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kj497
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 86s default-scheduler Successfully assigned default/library-7699b84747-9skst to gke-library-default-pool-35b5943a-ps5v
Normal Pulling 50s kubelet, gke-library-default-pool-35b5943a-ps5v pulling image "gcr.io/cloudsql-docker/gce-proxy:1.16"
Normal Pulled 47s kubelet, gke-library-default-pool-35b5943a-ps5v Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:1.16"
Normal Created 46s kubelet, gke-library-default-pool-35b5943a-ps5v Created container
Normal Started 46s kubelet, gke-library-default-pool-35b5943a-ps5v Started container
Normal Pulling 2s (x4 over 85s) kubelet, gke-library-default-pool-35b5943a-ps5v pulling image "gcr.io/library-259506/library"
Normal Created 1s (x4 over 50s) kubelet, gke-library-default-pool-35b5943a-ps5v Created container
Normal Started 1s (x4 over 50s) kubelet, gke-library-default-pool-35b5943a-ps5v Started container
Normal Pulled 1s (x4 over 52s) kubelet, gke-library-default-pool-35b5943a-ps5v Successfully pulled image "gcr.io/library-259506/library"
Warning BackOff 1s (x5 over 43s) kubelet, gke-library-default-pool-35b5943a-ps5v Back-off restarting failed container​
Here is the library.yaml file I have to go with it.
# [START kubernetes_deployment]
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: library
labels:
app: library
spec:
replicas: 2
template:
metadata:
labels:
app: library
spec:
containers:
- name: library-app
# Replace with your project ID or use `make template`
image: gcr.io/library-259506/library
# This setting makes nodes pull the docker image every time before
# starting the pod. This is useful when debugging, but should be turned
# off in production.
imagePullPolicy: Always
env:
# [START cloudsql_secrets]
- name: DATABASE_USER
valueFrom:
secretKeyRef:
name: cloudsql
key: username
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: cloudsql
key: password
# [END cloudsql_secrets]
ports:
- containerPort: 8080
# [START proxy_container]
- image: gcr.io/cloudsql-docker/gce-proxy:1.16
name: cloudsql-proxy
command: ["/cloud_sql_proxy", "--dir=/cloudsql",
"-instances=library-259506:asia-south1:library=tcp:3306",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: cloudsql-oauth-credentials
mountPath: /secrets/cloudsql
readOnly: true
- name: ssl-certs
mountPath: /etc/ssl/certs
- name: cloudsql
mountPath: /cloudsql
# [END proxy_container]
# [START volumes]
volumes:
- name: cloudsql-oauth-credentials
secret:
secretName: cloudsql-oauth-credentials
- name: ssl-certs
hostPath:
path: /etc/ssl/certs
- name: cloudsql
emptyDir:
# [END volumes]
# [END kubernetes_deployment]
---
# [START service]
# The library-svc service provides a load-balancing proxy over the polls app
# pods. By specifying the type as a 'LoadBalancer', Container Engine will
# create an external HTTP load balancer.
# The service directs traffic to the deployment by matching the service's selector to the deployment's label
#
# For more information about external HTTP load balancing see:
# https://cloud.google.com/container-engine/docs/load-balancer
apiVersion: v1
kind: Service
metadata:
name: library-svc
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
selector:
app: library
# [END service]
More error status
Container 'library-app' keeps crashing.
CrashLoopBackOff
Reason
Container 'library-app' keeps crashing.
Check Pod's logs to see more details. Learn more
Source
library-7699b84747-9skst
Conditions
Initialized: True Ready: False ContainersReady: False PodScheduled: True
- lastProbeTime: null
lastTransitionTime: "2019-12-06T06:03:43Z"
message: 'containers with unready status: [library-app]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
Key Events
Back-off restarting failed container BackOff Dec 6, 2019, 9:34:54
AM Dec 6, 2019, 12:24:26 PM 779 pulling image
"gcr.io/library-259506/library" Pulling Dec 6, 2019, 9:34:12 AM Dec 6,
2019, 11:59:26 AM 34
The Dockerfile is as follows (this fixed the CrashLoop btw):
FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
COPY requirements.txt /code/
RUN pip install -r requirements.txt
COPY . /code/
# Server
EXPOSE 8080
STOPSIGNAL SIGINT
ENTRYPOINT ["python", "manage.py"]
CMD ["runserver", "0.0.0.0:8080"]
I think a bunch of things all came together
I found the password to db had a special character that needed to be put within quotes and then ensuring port # where accurate across the Dockerfile, library.yaml files. This ensured the secrets actually worked, I detected in the logs a password mismatch issue.
IMPORTANT: the command line fix Emil G about ensuring my Dockerfile doesn't exit quickly, so make sure the CMD actually works and runs your server.
IMPORTANT: Finally I found a fix to the external IP not connecting to my server, see this thread where I explain what went wrong: basically I needed a security context where I had to fix the runAs to not run as root: RunAsUser issue & Clicking external IP of load balancer -> Bad Request (400) on deploying Django app on GKE (Kubernetes) and db connection failing:
I also documented all steps to deploy step 1-15 and