I have an AWS EKS cluster, in which there are two pods running: one for a redis cache, and the other for a GraphQL API.
Now I will present the config files for my kubernetes cluster.
cache-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: chatto-cache-deployment
spec:
replicas: 1
selector:
matchLabels:
tier: cache
template:
metadata:
labels:
tier: cache
spec:
containers:
- name: cache-container
image: redis
imagePullPolicy: Always
resources:
limits:
memory: 512Mi
cpu: "1"
requests:
memory: 256Mi
cpu: "0.2"
cache-service.yaml
apiVersion: v1
kind: Service
metadata:
name: chatto-cache-service
spec:
type: ClusterIP
selector:
tier: cache
ports:
- protocol: TCP
port: 6379
targetPort: 6379
server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: chatto-server-deployment
spec:
replicas: 1
selector:
matchLabels:
tier: server
template:
metadata:
labels:
tier: server
spec:
containers:
- name: server-container
image: rocketblast2481/chatto-server
imagePullPolicy: Always
resources:
limits:
memory: 512Mi
cpu: "1"
requests:
memory: 256Mi
cpu: "0.2"
env:
- name: DB_USERNAME
valueFrom:
configMapKeyRef:
name: env-config-map
key: DB_USERNAME
- name: DB_PASSWORD
valueFrom:
configMapKeyRef:
name: env-config-map
key: DB_PASSWORD
- name: DB_HOST
valueFrom:
configMapKeyRef:
name: env-config-map
key: DB_HOST
- name: DB_PORT
valueFrom:
configMapKeyRef:
name: env-config-map
key: DB_PORT
- name: DB_DATABASE
valueFrom:
configMapKeyRef:
name: env-config-map
key: DB_DATABASE
- name: REDIS_HOST
valueFrom:
configMapKeyRef:
name: env-config-map
key: REDIS_HOST
- name: REDIS_PORT
valueFrom:
configMapKeyRef:
name: env-config-map
key: REDIS_PORT
server-service.yaml
apiVersion: v1
kind: Service
metadata:
name: chatto-server-service
spec:
type: LoadBalancer
selector:
tier: server
ports:
- protocol: TCP
port: 80
targetPort: 80
Okay, here's the problem. As you can see, the server-service is of type LoadBalancer.
When I run the command, kubectl get services, this is the output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
chatto-cache-service ClusterIP 10.100.73.117 <none> 6379/TCP 5h17m
chatto-server-service LoadBalancer 10.100.11.249 afe3adae38c0242c8be0795609ee8a6c-424128423.us-east-2.elb.amazonaws.com 80:31643/TCP 17m
kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 6h35m
I copied the EXTERNAL-IP of the chatto-server-service and slapped it into Postman, but I get an error that the connection refused.
Could someone tell me why this might be happening? It might be because of the way I have configured the security groups, but I don't know for sure.
Thanks for any feedback and help and please let me know if you need any other information.
Edit:
Here is a screenshot of the load balancer:
Related
I have three containers in a pod: nginx, redis, custom django app. It seems like none of them talk to each other with kubernetes. In docker compose they do but I can't use docker compose in production.
The django container gets this error:
[2022-06-20 21:45:49,420: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error 111 connecting to redis:6379. Connection refused..
Trying again in 32.00 seconds... (16/100)
and the nginx container starts but never shows any traffic. Trying to connect to localhost:8000 gets no reply.
Any idea whats wrong with my yml file?
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
creationTimestamp: null
name: djangonetwork
spec:
ingress:
- from:
- podSelector:
matchLabels:
io.kompose.network/djangonetwork: "true"
podSelector:
matchLabels:
io.kompose.network/djangonetwork: "true"
---
apiVersion: v1
data:
DB_HOST: db
DB_NAME: django_db
DB_PASSWORD: password
DB_PORT: "5432"
DB_USER: user
kind: ConfigMap
metadata:
creationTimestamp: null
labels:
io.kompose.service: web
name: envs--django
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
io.kompose.service: web
name: web
spec:
replicas: 1
selector:
matchLabels:
io.kompose.service: web
strategy:
type: Recreate
template:
metadata:
labels:
io.kompose.network/djangonetwork: "true"
io.kompose.service: web
spec:
containers:
- image: nginx:alpine
name: nginxcontainer
ports:
- containerPort: 8000
- image: redis:alpine
name: rediscontainer
ports:
- containerPort: 6379
resources: {}
- env:
- name: DB_HOST
valueFrom:
configMapKeyRef:
key: DB_HOST
name: envs--django
- name: DB_NAME
valueFrom:
configMapKeyRef:
key: DB_NAME
name: envs--django
- name: DB_PASSWORD
valueFrom:
configMapKeyRef:
key: DB_PASSWORD
name: envs--django
- name: DB_PORT
valueFrom:
configMapKeyRef:
key: DB_PORT
name: envs--django
- name: DB_USER
valueFrom:
configMapKeyRef:
key: DB_USER
name: envs--django
image: localhost:5000/integration/web:latest
name: djangocontainer
ports:
- containerPort: 8000
resources: {}
restartPolicy: Always
status: {}
---
apiVersion: v1
kind: Service
metadata:
labels:
io.kompose.service: web
name: web
spec:
ports:
- name: "8000"
port: 8000
targetPort: 8000
selector:
io.kompose.service: web
You've put all three containers into a single Pod. That's usually not the preferred approach: it means you can't restart one of the containers without restarting all of them (any update to your application code requires discarding your Redis cache) and you can't individually scale the component parts (if you need five replicas of your application, do you also need five reverse proxies and can you usefully use five Redises?).
Instead, a preferred approach is to split these into three separate Deployments (or possibly use a StatefulSet for Redis with persistence). Each has a corresponding Service, and then those Service names can be used as DNS names.
A very minimal example for Redis could look like:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
spec:
replicas: 1
template:
metadata:
labels:
service: web
component: redis
spec:
containers:
- name: redis
image: redis
ports:
- name: redis
containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis # <-- this name will be a DNS name
spec:
selector: # matches the template: { metadata: { labels: } }
service: web
component: redis
ports:
- name: redis
port: 6379
targetPort: redis # matches a containerPorts: [{ name: }]
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
...
env:
- name: REDIS_HOST
value: redis # matches the Service
If all three parts are in the same Pod, then the Service can't really distinguish which part it's talking to. In principle, between these containers, they share a network namespace and need to talk to each other as localhost; the containers: [{ name: }] have no practical effect.
Due to security requirements, I need to install Istio offline on an AWS EKS cluster. I could not find any clear documentation on offline installation or recommendations. So I have used the “istioctl generate manifest > generated-manifest.yml” to save the installation files, I have replaced the 3 images (proxy, pilot, prometheus) with my images hosted in AWS ECR and applied the file with kubectl. I should mention that “istioctl apply manifest” times out for some reason. I have also created the “istio-system” namespace manually before installation.
I have used these 2 annotations for an internal Network Load Balancer:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
The installation creates the CRDs, the pods and services. Here are my questions, if someone can point me into the right direction:
Where can I find some proper documentation for offline installation?
There is no istio-egressgateway with the default profile. Is the egress gateway not needed if I need to expose an application?
The NLB is created with 4 listeners (80, 443, 15443, 15021) but I can connect only to 15201, the others refuse the connection. Hence only the 15021 target group is healthy, the other 3 are not.
Adding below some of the relevant code:
kubectl describe service/istio-ingressgateway -n istio-system
------
Name: istio-ingressgateway
Namespace: istio-system
Labels: app=istio-ingressgateway
istio=ingressgateway
release=istio
Annotations: service.beta.kubernetes.io/aws-load-balancer-internal: true
service.beta.kubernetes.io/aws-load-balancer-type: nlb
Selector: app=istio-ingressgateway,istio=ingressgateway
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.212.217
IPs: 172.20.212.217
LoadBalancer Ingress: aaaa-bbb.elb.eu-west-2.amazonaws.com
Port: status-port 15021/TCP
TargetPort: 15021/TCP
NodePort: status-port 32094/TCP
Endpoints: 10.0.35.41:15021
Port: http2 80/TCP
TargetPort: 80/TCP
NodePort: http2 30890/TCP
Endpoints: 10.0.35.41:80
Port: https 443/TCP
TargetPort: 443/TCP
NodePort: https 32039/TCP
Endpoints: 10.0.35.41:443
Port: tls 15443/TCP
TargetPort: 15443/TCP
NodePort: tls 30690/TCP
Endpoints: 10.0.35.41:15443
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
sh-4.2$ kubectl describe service/istiod -n istio-system
Name: istiod
Namespace: istio-system
Labels: app=istiod
istio=pilot
istio.io/rev=default
release=istio
Annotations: <none>
Selector: app=istiod,istio=pilot
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.25.159
IPs: 172.20.25.159
Port: grpc-xds 15010/TCP
TargetPort: 15010/TCP
Endpoints: 10.0.43.252:15010
Port: https-dns 15012/TCP
TargetPort: 15012/TCP
Endpoints: 10.0.43.252:15012
Port: https-webhook 443/TCP
TargetPort: 15017/TCP
Endpoints: 10.0.43.252:15017
Port: http-monitoring 15014/TCP
TargetPort: 15014/TCP
Endpoints: 10.0.43.252:15014
Port: dns-tls 853/TCP
TargetPort: 15053/TCP
Endpoints: 10.0.43.252:15053
Session Affinity: None
Events: <none>
kind: CustomResourceDefinition
apiVersion: apiextensions.k8s.io/v1beta1
metadata:
name: adapters.config.istio.io
labels:
app: mixer
package: adapter
istio: mixer-adapter
chart: istio
heritage: Tiller
release: istio
annotations:
"helm.sh/resource-policy": keep
spec:
group: config.istio.io
names:
kind: adapter
plural: adapters
singular: adapter
categories:
- istio-io
- policy-istio-io
scope: Namespaced
subresources:
status: {}
versions:
- name: v1alpha2
served: true
storage: true
---
kind: CustomResourceDefinition
apiVersion: apiextensions.k8s.io/v1beta1
metadata:
name: templates.config.istio.io
labels:
app: mixer
package: template
istio: mixer-template
chart: istio
heritage: Tiller
release: istio
annotations:
"helm.sh/resource-policy": keep
spec:
group: config.istio.io
names:
kind: template
plural: templates
singular: template
categories:
- istio-io
- policy-istio-io
scope: Namespaced
subresources:
status: {}
versions:
- name: v1alpha2
served: true
storage: true
---
# Cni component is disabled.
# EgressGateways istio-egressgateway component is disabled.
# Resources for IngressGateways component
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
labels:
app: istio-ingressgateway
istio: ingressgateway
release: istio
name: istio-ingressgateway
namespace: istio-system
spec:
maxReplicas: 5
metrics:
- resource:
name: cpu
targetAverageUtilization: 80
type: Resource
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: istio-ingressgateway
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: istio-ingressgateway
istio: ingressgateway
release: istio
name: istio-ingressgateway
namespace: istio-system
spec:
selector:
matchLabels:
app: istio-ingressgateway
istio: ingressgateway
strategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 25%
template:
metadata:
annotations:
sidecar.istio.io/inject: "false"
labels:
app: istio-ingressgateway
chart: gateways
heritage: Tiller
istio: ingressgateway
release: istio
service.istio.io/canonical-name: istio-ingressgateway
service.istio.io/canonical-revision: latest
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: beta.kubernetes.io/arch
operator: In
values:
- amd64
weight: 2
- preference:
matchExpressions:
- key: beta.kubernetes.io/arch
operator: In
values:
- ppc64le
weight: 2
- preference:
matchExpressions:
- key: beta.kubernetes.io/arch
operator: In
values:
- s390x
weight: 2
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: beta.kubernetes.io/arch
operator: In
values:
- amd64
- ppc64le
- s390x
containers:
- args:
- proxy
- router
- --domain
- $(POD_NAMESPACE).svc.cluster.local
- --proxyLogLevel=warning
- --proxyComponentLogLevel=misc:error
- --log_output_level=default:info
- --serviceCluster
- istio-ingressgateway
- --trust-domain=cluster.local
env:
- name: JWT_POLICY
value: third-party-jwt
- name: PILOT_CERT_PROVIDER
value: istiod
- name: CA_ADDR
value: istiod.istio-system.svc:15012
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: SERVICE_ACCOUNT
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: CANONICAL_SERVICE
valueFrom:
fieldRef:
fieldPath: metadata.labels['service.istio.io/canonical-name']
- name: CANONICAL_REVISION
valueFrom:
fieldRef:
fieldPath: metadata.labels['service.istio.io/canonical-revision']
- name: ISTIO_META_WORKLOAD_NAME
value: istio-ingressgateway
- name: ISTIO_META_OWNER
value: kubernetes://apis/apps/v1/namespaces/istio-system/deployments/istio-ingressgateway
- name: ISTIO_META_MESH_ID
value: cluster.local
- name: ISTIO_META_ROUTER_MODE
value: sni-dnat
- name: ISTIO_META_CLUSTER_ID
value: Kubernetes
image: 777777777777.dkr.ecr.eu-west-2.amazonaws.com/istio-proxyv2:1.6.8
name: istio-proxy
ports:
- containerPort: 15021
- containerPort: 8080
- containerPort: 8443
- containerPort: 15443
- containerPort: 15011
- containerPort: 15012
- containerPort: 8060
- containerPort: 853
- containerPort: 15090
name: http-envoy-prom
protocol: TCP
readinessProbe:
failureThreshold: 30
httpGet:
path: /healthz/ready
port: 15021
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
cpu: 2000m
memory: 1024Mi
requests:
cpu: 100m
memory: 128Mi
volumeMounts:
- mountPath: /etc/istio/proxy
name: istio-envoy
- mountPath: /etc/istio/config
name: config-volume
- mountPath: /var/run/secrets/istio
name: istiod-ca-cert
- mountPath: /var/run/secrets/tokens
name: istio-token
readOnly: true
- mountPath: /var/run/ingress_gateway
name: ingressgatewaysdsudspath
- mountPath: /etc/istio/pod
name: podinfo
- mountPath: /etc/istio/ingressgateway-certs
name: ingressgateway-certs
readOnly: true
- mountPath: /etc/istio/ingressgateway-ca-certs
name: ingressgateway-ca-certs
readOnly: true
serviceAccountName: istio-ingressgateway-service-account
volumes:
- configMap:
name: istio-ca-root-cert
name: istiod-ca-cert
- downwardAPI:
items:
- fieldRef:
fieldPath: metadata.labels
path: labels
- fieldRef:
fieldPath: metadata.annotations
path: annotations
name: podinfo
- emptyDir: {}
name: istio-envoy
- emptyDir: {}
name: ingressgatewaysdsudspath
- name: istio-token
projected:
sources:
- serviceAccountToken:
audience: istio-ca
expirationSeconds: 43200
path: istio-token
- configMap:
name: istio
optional: true
name: config-volume
- name: ingressgateway-certs
secret:
optional: true
secretName: istio-ingressgateway-certs
- name: ingressgateway-ca-certs
secret:
optional: true
secretName: istio-ingressgateway-ca-certs
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: istio-ingressgateway
namespace: istio-system
labels:
app: istio-ingressgateway
istio: ingressgateway
release: istio
spec:
minAvailable: 1
selector:
matchLabels:
app: istio-ingressgateway
istio: ingressgateway
release: istio
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: istio-ingressgateway-sds
namespace: istio-system
labels:
release: istio
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: istio-ingressgateway-sds
namespace: istio-system
labels:
release: istio
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: istio-ingressgateway-sds
subjects:
- kind: ServiceAccount
name: istio-ingressgateway-service-account
---
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
labels:
app: istio-ingressgateway
istio: ingressgateway
release: istio
name: istio-ingressgateway
namespace: istio-system
spec:
ports:
- name: status-port
port: 15021
targetPort: 15021
- name: http2
port: 80
targetPort: 80
- name: https
port: 443
targetPort: 443
- name: tls
port: 15443
targetPort: 15443
selector:
app: istio-ingressgateway
istio: ingressgateway
type: LoadBalancer
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: istio-ingressgateway-service-account
namespace: istio-system
labels:
app: istio-ingressgateway
istio: ingressgateway
release: istio
---
# IstiodRemote component is disabled.
# Resources for Pilot component
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: istiod
namespace: istio-system
labels:
app: istiod
release: istio
istio.io/rev: default
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: istiod
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
---
I'm trying to create a deployment on AWS EKS with my application and metricbeat as sidecar, so I have the following YML:
---
apiVersion: v1
kind: ConfigMap
metadata:
name: metricbeat-modules
namespace: testframework
labels:
k8s-app: metricbeat
data:
kubernetes.yml: |-
- module: kubernetes
metricsets:
- node
- system
- pod
- container
- volume
period: 10s
host: ${NODE_NAME}
hosts: [ "https://${NODE_IP}:10250" ]
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
ssl.verification_mode: "none"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: metricbeat-config
namespace: testframework
labels:
k8s-app: metricbeat
data:
metricbeat.yml: |-
processors:
- add_cloud_metadata:
- add_tags:
tags: ["EKSCORP_DEV"]
target: "cluster_test"
metricbeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
output.elasticsearch:
index: "metricbeat-k8s-%{[agent.version]}-%{+yyyy.MM.dd}"
setup.template.name: "metricbeat-k8s"
setup.template.pattern: "metricbeat-k8s-*"
setup.ilm.enabled: false
cloud.id: ${ELASTIC_CLOUD_ID}
cloud.auth: ${ELASTIC_CLOUD_AUTH}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: testframework-initializr-deploy
namespace: testframework
spec:
replicas: 1
selector:
matchLabels:
app: testframework-initializr
template:
metadata:
labels:
app: testframework-initializr
annotations:
co.elastic.logs/enabled: 'true'
co.elastic.logs/json.keys_under_root: 'true'
co.elastic.logs/json.add_error_key: 'true'
co.elastic.logs/json.message_key: 'message'
spec:
containers:
- name: testframework-initializr
image: XXXXX.dkr.ecr.us-east-1.amazonaws.com/testframework-initializr
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /health/liveness
port: 8080
initialDelaySeconds: 300
periodSeconds: 10
timeoutSeconds: 60
failureThreshold: 5
readinessProbe:
httpGet:
port: 8080
path: /health
initialDelaySeconds: 300
periodSeconds: 10
timeoutSeconds: 10
failureThreshold: 3
- name: metricbeat-sidecar
image: docker.elastic.co/beats/metricbeat:7.12.0
args: [
"-c", "/etc/metricbeat.yml",
"-e",
"-system.hostfs=/hostfs"
]
env:
- name: ELASTIC_CLOUD_ID
value: xxxx
- name: ELASTIC_CLOUD_AUTH
value: xxxx
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
securityContext:
runAsUser: 0
volumeMounts:
- name: config
mountPath: /etc/metricbeat.yml
readOnly: true
subPath: metricbeat.yml
- name: modules
mountPath: /usr/share/metricbeat/modules.d
readOnly: true
volumes:
- name: config
configMap:
defaultMode: 0640
name: metricbeat-config
- name: modules
configMap:
defaultMode: 0640
name: metricbeat-modules
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prom-admin
rules:
- apiGroups: [""]
resources: ["pods", "nodes"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prom-rbac
subjects:
- kind: ServiceAccount
name: default
namespace: testframework
roleRef:
kind: ClusterRole
name: prom-admin
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: Service
metadata:
name: testframework-initializr-service
namespace: testframework
spec:
type: NodePort
ports:
- port: 80
targetPort: 8080
selector:
app: testframework-initializr
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: testframework-initializr-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/target-type: ip
spec:
rules:
- host: dev-initializr.test.net
http:
paths:
- backend:
serviceName: testframework-initializr-service
servicePort: 80
Well, after startup the POD in AWS EKS, I got the following error in Kubernetes Metricbeat Container:
INFO module/wrapper.go:259 Error fetching data for metricset kubernetes.system: error doing HTTP request to fetch 'system' Metricset data: error making http request: Get "https://IP_FROM_FARGATE_HERE:10250/stats/summary": dial tcp IP_FROM_FARGATE_HERE:10250: connect: connection refused
I tried to use the "NODE_NAME" instead "NODE_IP", but I got "No Such Host". Any idea how can I fix it?
I have been trying to set up RabbitMQ on a k8s cluster, I finally got everything set up, but only one node shows up on the managementUI. Here are my steps:
1. Dockerfile Setup
I do this to enable autocluster:
FROM rabbitmq:3.8-rc-management-alpine
MAINTAINER kevlai
RUN rabbitmq-plugins --offline enable rabbitmq_peer_discovery_k8s
2. Set up RBAC
apiVersion: v1
kind: ServiceAccount
metadata:
name: borecast-rabbitmq
namespace: borecast-production
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: borecast-rabbitmq
namespace: borecast-production
rules:
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: borecast-rabbitmq
namespace: borecast-production
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: dev
subjects:
- kind: ServiceAccount
name: borecast-rabbitmq
namespace: borecast-production
3. Set up Secrets
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-secret
namespace: borecast-production
type: Opaque
data:
username: a2V2
password: Ym9yZWNhc3RydWx6
secretCookie: c2VjcmV0Y29va2llaGVyZQ==
4. Set up StorageClass
I'm setting up StorageClass so k8s will automatically do provision for me on AWS.
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: rabbitmq-sc
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
zone: us-east-2a
reclaimPolicy: Retain
5. Set up StatefulSets and Services
You can see there are two services. The headless service is for the pods themselves. As for the management service, I'll expose the service for an Ingress controller in order for it to be accessible from outside.
---
apiVersion: v1
kind: Service
metadata:
name: borecast-rabbitmq-management-service
namespace: borecast-production
labels:
app: borecast-rabbitmq
spec:
ports:
- port: 15672
targetPort: 15672
name: http
- port: 5672
targetPort: 5672
name: amqp
selector:
app: borecast-rabbitmq
---
apiVersion: v1
kind: Service
metadata:
name: borecast-rabbitmq-service
namespace: borecast-production
labels:
app: borecast-rabbitmq
spec:
clusterIP: None
ports:
- port: 5672
name: amqp
selector:
app: borecast-rabbitmq
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: borecast-rabbitmq
namespace: borecast-production
spec:
serviceName: borecast-rabbitmq-service
replicas: 3
template:
metadata:
labels:
app: borecast-rabbitmq
spec:
serviceAccountName: borecast-rabbitmq
containers:
- image: docker.borecast.com/borecast-rabbitmq:v1.0.3
name: borecast-rabbitmq
imagePullPolicy: Always
resources:
requests:
memory: "256Mi"
cpu: "150m"
limits:
memory: "512Mi"
cpu: "250m"
ports:
- containerPort: 5672
name: amqp
env:
- name: RABBITMQ_DEFAULT_USER
valueFrom:
secretKeyRef:
name: rabbitmq-secret
key: username
- name: RABBITMQ_DEFAULT_PASS
valueFrom:
secretKeyRef:
name: rabbitmq-secret
key: password
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbitmq-secret
key: secretCookie
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: K8S_SERVICE_NAME
# value: borecast-rabbitmq-service.borecast-production.svc.cluster.local
value: borecast-rabbitmq-service
- name: RABBITMQ_USE_LONGNAME
value: "true"
- name: RABBITMQ_NODENAME
value: "rabbit#$(MY_POD_NAME).$(K8S_SERVICE_NAME)"
# value: rabbit#$(MY_POD_NAME).borecast-rabbitmq-service.borecast-production.svc.cluster.local
- name: RABBITMQ_NODE_TYPE
value: disc
- name: AUTOCLUSTER_TYPE
value: "k8s"
- name: AUTOCLUSTER_DELAY
value: "10"
- name: AUTOCLUSTER_CLEANUP
value: "true"
- name: CLEANUP_WARN_ONLY
value: "false"
- name: K8S_ADDRESS_TYPE
value: "hostname"
- name: K8S_HOSTNAME_SUFFIX
value: ".$(K8S_SERVICE_NAME)"
# value: .borecast-rabbitmq-service.borecast-production.svc.cluster.local
volumeMounts:
- name: rabbitmq-volume
mountPath: /var/lib/rabbitmq
imagePullSecrets:
- name: regcred
volumeClaimTemplates:
- metadata:
name: rabbitmq-volume
namespace: borecast-production
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: rabbitmq-sc
resources:
requests:
storage: 5Gi
Problem
Everything is working. However, when I access the management UI (i.e. I'm access the borecast-rabbitmq-management-service, port 15672), I only see one node showing up, when it should be three:
Also notice that the cluster name is
rabbit#borecast-rabbitmq-0.borecast-rabbitmq-service.borecast-production.svc.cluster.local
but when I log out and log in again, sometimes the number 0 will be changed to 1 or 2 for borecast-rabbitmq-0.
And also notice the node name is
rabbit#borecast-rabbitmq-1.borecast-rabbitmq-service
And you guessed it, sometimes the number is 2 or 0 for borecast-rabbitmq-1.
I have been trying to debug but to no avail. The logs for each pod doesn't raise any suspicions and every service and statefulset are working normally. I repeated the five steps multiple times, and if your cluster is on AWS, you can totally replicate my setup by following the steps (after creating the namespace borecast-production of course). If anybody can shed some light on the matter, I'll be eternally grateful.
The problem is with the headless service name definition:
- name: K8S_SERVICE_NAME
# value: borecast-rabbitmq-service.borecast-production.svc.cluster.local
value: borecast-rabbitmq-service
which is a building block of node name:
- name: RABBITMQ_NODENAME
value: "rabbit#$(MY_POD_NAME).$(K8S_SERVICE_NAME)"
whereas the proper node name, should be of FQDN of the POD (<statefulset name>-<ordinal index>.<headless_svc_name>.<namespace>.svc.cluster.local):
- name: RABBITMQ_NODENAME
value: "rabbit#$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local"
Therefore you ended up with NodeName
borecast-rabbitmq-1.borecast-rabbitmq-service
instead of:
borecast-rabbitmq-1.borecast-rabbitmq-service.borecast-production.svc.cluster.local
Look up the fqdn of the pod created by borecast-rabbitmq StatefulSet (in other word: SRV records of the Pods) with nslookup util from inside of your cluster as explained here, to see what form the RABBITMQ_NODENAME is expected to have.
try exposing 4369 for headless service;
https://www.rabbitmq.com/clustering.html
see the port access section
Had the same issue, and it came down to
Deleting all the rabbitmq resources including the pvc created under the statefulset.
Then reinstalling everything from the manifests.
I'm collecting Prometheus metrics from a uwsgi application hosted on Kubernetes, the metrics are not retained after the pods are deleted. Prometheus server is hosted on the same kubernetes cluster and I have assigned a persistent storage to it.
How do I retain the metrics from the pods even after they deleted?
The Prometheus deployment yaml:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: prometheus
namespace: default
spec:
replicas: 1
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus/"
- "--storage.tsdb.retention=2200h"
ports:
- containerPort: 9090
volumeMounts:
- name: prometheus-config-volume
mountPath: /etc/prometheus/
- name: prometheus-storage-volume
mountPath: /prometheus/
volumes:
- name: prometheus-config-volume
configMap:
defaultMode: 420
name: prometheus-server-conf
- name: prometheus-storage-volume
persistentVolumeClaim:
claimName: azurefile
---
apiVersion: v1
kind: Service
metadata:
labels:
app: prometheus
name: prometheus
spec:
type: LoadBalancer
loadBalancerIP: ...
ports:
- port: 80
protocol: TCP
targetPort: 9090
selector:
app: prometheus
Application deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-app
spec:
replicas: 2
selector:
matchLabels:
app: api-app
template:
metadata:
labels:
app: api-app
spec:
containers:
- name: nginx
image: nginx
lifecycle:
preStop:
exec:
command: ["/usr/sbin/nginx","-s","quit"]
ports:
- containerPort: 80
protocol: TCP
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 10m
memory: 50Mi
volumeMounts:
- name: app-api
mountPath: /var/run/app
- name: nginx-conf
mountPath: /etc/nginx/conf.d
- name: api-app
image: azurecr.io/app_api_se:opencv
workingDir: /app
command: ["/usr/local/bin/uwsgi"]
args:
- "--die-on-term"
- "--manage-script-name"
- "--mount=/=api:app_dispatch"
- "--socket=/var/run/app/uwsgi.sock"
- "--chmod-socket=777"
- "--pyargv=se"
- "--metrics-dir=/storage"
- "--metrics-dir-restore"
resources:
requests:
cpu: 150m
memory: 1Gi
volumeMounts:
- name: app-api
mountPath: /var/run/app
- name: storage
mountPath: /storage
volumes:
- name: app-api
emptyDir: {}
- name: storage
persistentVolumeClaim:
claimName: app-storage
- name: nginx-conf
configMap:
name: app
tolerations:
- key: "sku"
operator: "Equal"
value: "test"
effect: "NoSchedule"
---
apiVersion: v1
kind: Service
metadata:
labels:
app: api-app
name: api-app
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
selector:
app: api-app
Your issue is with the wrong type of controller used to deploy Prometheus. The Deployment controller is wrong choice in this case (it's meant for Stateless applications, that don't need to maintain any persistence identifiers between Pods rescheduling - like persistence data).
You should switch to StatefulSet kind*, if you require persistence of data (metrics scraped by Prometheus) across Pod (re)scheduling.
*This is how Prometheus is deployed by default with prometheus-operator.
With this configuration for a volume, it will be removed when you release a pod. You are basically looking for a PersistentVolumne, documentation and example.
Also check, PersistentVolumeClaim.