argocd login redirect loop for SSO - argocd

I'm using argocd with JumpCloud for SSO. For some reason, it periodically gets into a weird state where clicking on the login button redirects back to the login page, a not found error, or an infinite redirect loop. It's intermittent and can sometimes (but not always) be fixed by deleting cookies, clearing cache, and/or restarting or deleting the argocd-server pods.
charts/argocd/values.yaml
environment: ""
argo-cd:
redis-ha:
enabled: true
controller:
replicas: 1
metrics:
enabled: true
serviceMonitor:
enabled: true
namespace: kube-system
additionalLabels:
release: kube-prometheus-stack
nodeSelector:
eks.amazonaws.com/capacityType: ON_DEMAND
args:
appResyncPeriod: 20
resources:
requests:
cpu: 1000m
memory: 768Mi
server:
metrics:
enabled: true
serviceMonitor:
enabled: true
namespace: kube-system
additionalLabels:
release: kube-prometheus-stack
nodeSelector:
eks.amazonaws.com/capacityType: ON_DEMAND
resources:
requests:
cpu: 50m
memory: 128Mi
rbacConfigCreate: false
config:
dex.config: |
connectors:
- type: ldap
name: jumpcloud.com
id: ad
config:
# Ldap server address
host: ldap.jumpcloud.com:636
insecureNoSSL: false
insecureSkipVerify: true
# Variable name stores ldap bindDN in argocd-secret
bindDN: "uid=dexservice,ou=Users,o=myorganizationid,dc=jumpcloud,dc=com"
# Variable name stores ldap bind password in argocd-secret
bindPW: "mysecurepassword"
usernamePrompt: Jumpcloud Username
userSearch:
baseDN: ou=Users,o=myorganizationid,dc=jumpcloud,dc=com
username: uid
idAttr: entryDN
emailAttr: mail
nameAttr: cn
groupSearch:
baseDN: ou=Users,o=myorganizationid,dc=jumpcloud,dc=com
filter: "(|(cn=K8S-Admin)(cn=K8S-ReadOnly))"
userMatchers:
- userAttr: entryDN
groupAttr: member
nameAttr: cn
repositories: |
- url: git#github.com:myorgname/myreponame
sshPrivateKeySecret:
name: ssh-private-key-secret
key: id_ed25519
service:
type: NodePort
dex:
nodeSelector:
eks.amazonaws.com/capacityType: ON_DEMAND
redis:
nodeSelector:
eks.amazonaws.com/capacityType: ON_DEMAND
repoServer:
replicas: 2
nodeSelector:
eks.amazonaws.com/capacityType: ON_DEMAND
applicationSet:
replicaCount: 2
configs:
params:
server.insecure: true
charts/argocd/templates/argocd-rbac-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-rbac-cm
labels:
helm.sh/chart: {{ (index .Chart.Dependencies 0).Version }}
app.kubernetes.io/component: server
app.kubernetes.io/instance: argocd
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: argocd-rbac-cm
app.kubernetes.io/part-of: argocd
argocd.argoproj.io/instance: argocd
data:
policy.default: role:none
scopes: '[groups, email]'
policy.csv: |
p, role:none, *, *, */*, deny
g, K8S-Admin, role:admin
g, K8S-ReadOnly, "role:admin"
charts/argocd/templates/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ .Chart.Name }}
annotations:
"kubernetes.io/ingress.class": "alb"
alb.ingress.kubernetes.io/group.name: {{ .Values.environment }}
"alb.ingress.kubernetes.io/scheme": "internal"
"alb.ingress.kubernetes.io/target-type": "instance"
"alb.ingress.kubernetes.io/listen-ports": '[{"HTTP":80},{"HTTPS":443}]'
"alb.ingress.kubernetes.io/backend-protocol": HTTP
"alb.ingress.kubernetes.io/healthcheck-path": "/"
"alb.ingress.kubernetes.io/actions.ssl-redirect": '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
"external-dns.alpha.kubernetes.io/hostname": argocd.{{ .Values.hostName }}
spec:
rules:
- host: argocd.{{ .Values.hostName }}
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: ssl-redirect
port:
name: use-annotation
- path: /
pathType: Prefix
backend:
service:
name: argocd-server
port:
number: 80

We're currently experiencing similar (weird) issues with our Argo CD stack that utilizes DEX as an LDAP connector. DEX logs shows successful authentication, yet when returning to Argo CD, it intermittently seems to switch back to an OICD based login method and denies access with varying error messages completely unrelated to LDAP.
Recent communication on this topic on Argo's Slack Channel showed that this is not a known issue, but might have to do with running Argo in HA mode, i.e. using two or more argocd-server-instances, which was exactly our setup.
I've scaled down argocd-server to one instance only for the time being. Until now, the issue hasn't resurfaced.
This might not be an exact answer, but rather (hopefully) provide a helpful hint as to where to look for a possible solution. Approval pending. :)

Related

How to access a MinIO instance within a Kubernetes cluster?

I've deployed a MinIO server on Kubernetes with cdk8s, used minio1 as serviceId and exposed port 9000.
My expectations were that I could access it using http://minio1:9000, but my MinIO server is unreachable from both Prometheus and other instances in my namespace (Loki, Mimir etc...). Is there a specific configuration I missed to enable access within the network? The server starts without error, so it sounds a networking issue.
I'm starting the server this way:
command: ["minio"],
args: [
"server",
"/data",
"--address",
`:9000`,
"--console-address",
`:9001`,
],
Patched the K8S configuration to expose both 9000 and 9001
const d = ApiObject.of(minioDeployment);
//Create the empty port list
d.addJsonPatch(JsonPatch.add("/spec/template/spec/containers/0/ports", []));
//add port for console
d.addJsonPatch(
JsonPatch.add("/spec/template/spec/containers/0/ports/0", {
name: "console",
containerPort: 9001,
})
);
// add port for bucket
d.addJsonPatch(
JsonPatch.replace("/spec/template/spec/containers/0/ports/1", {
name: "bucket",
containerPort: 9000,
})
);
Can it be related to the multi-port configuration? Or is there a way to explicitly define the hostname as service id to make it accessible in the Kubernetes namespace?
Here's my service definition generated by cdk8s:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
use-default-egress-policy: "true"
name: minio1
namespace: ns-monitoring
spec:
minReadySeconds: 0
progressDeadlineSeconds: 600
replicas: 1
selector:
matchExpressions: []
matchLabels:
cdk8s.deployment: monitoring-stack-minio1-minio1-deployment-c8e6c44f
strategy:
rollingUpdate:
maxSurge: 50%
maxUnavailable: 50%
type: RollingUpdate
template:
metadata:
labels:
cdk8s.deployment: monitoring-stack-minio1-minio1-deployment-c8e6c44f
spec:
automountServiceAccountToken: true
containers:
- args:
- server
- /data/minio/
- --address
- :9000
- --console-address
- :9001
command:
- minio
env:
- name: MINIO_ROOT_USER
value: userminio
- name: MINIO_ROOT_PASSWORD
value: XXXXXXXXXXXXXX
- name: MINIO_BROWSER
value: "on"
- name: MINIO_PROMETHEUS_AUTH_TYPE
value: public
image: minio/minio
imagePullPolicy: Always
name: minio1-docker
ports:
- containerPort: 9001
name: console
- containerPort: 9000
name: bucket
securityContext:
privileged: false
readOnlyRootFilesystem: false
runAsNonRoot: false
volumeMounts:
- mountPath: /data
name: data
dnsConfig:
nameservers: []
options: []
searches: []
dnsPolicy: ClusterFirst
hostAliases: []
initContainers: []
restartPolicy: Always
securityContext:
fsGroupChangePolicy: Always
runAsNonRoot: false
sysctls: []
setHostnameAsFQDN: false
volumes:
- emptyDir: {}
name: data
---
apiVersion: v1
kind: Service
metadata:
labels:
use-default-egress-policy: "true"
name: minio1
namespace: ns-monitoring
spec:
externalIPs: []
ports:
- port: 443
targetPort: 9001
selector:
cdk8s.deployment: stack-minio1-minio1-deployment-c8e6c44f
type: ClusterIP
As mentioned in the document:
List of ports to expose from the container. Not specifying a port here
DOES NOT prevent that port from being exposed. Any port which is
listening on the default "0.0.0.0" address inside a container will be
accessible from the network
.
As #user2311578 suggested Exposing it on the container level does not make it available automatically on the service. You must specify it there when you want to access it through the service (and its virtual IP).
Also check the GitHub link for more information.

Service can't be registered in Target groups

I'm new to using Kubernetes and AWS so there are a lot of concepts I may not understand. I hope you can help me with this problem I am having.
I have 3 services, frontend, backend and auth each with their corresponding nodeport and an ingress that maps the one host to each service, everything is running on EKS and for the ingress deployment I am using AWS ingress controller. Once everything is deployed I try to register the node-group in the targets the frontend and auth services work correctly but backend stays in unhealthy state. I thought it could be a port problem but if you look at auth and backend they have almost the same deployment defined and both are api created with dotnet core. One thing to note is that I can do kubectl port-forward <backend-pod> 80:80 and it is running without problems. And when I run the kubectl describe ingresses command I get this:
Name: ingress
Labels: app.kubernetes.io/managed-by=Helm
Namespace: default
Address: xxxxxxxxxxxxxxxxxxxxxxxxxxx.xxxxx.elb.amazonaws.com
Ingress Class: \<none\>
Default backend: \<default\>
Rules:
Host Path Backends
----------------
domain.com
/ front-service:default-port (10.0.1.183:80,10.0.2.98:80)
back.domain.com
/ backend-service:default-port (\<none\>)
auth.domain.com
/ auth-service:default-port (10.0.1.30:80,10.0.1.33:80)
alb.ingress.kubernetes.io/listen-ports: \[{"HTTPS":443}, {"HTTP":80}\]
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/ssl-redirect: 443
kubernetes.io/ingress.class: alb
Events:
Type Reason Age From Message
-------------------------
Normal SuccessfullyReconciled 8m20s (x15 over 41h) ingress Successfully reconciled
Frontend
apiVersion: apps/v1
kind: Deployment
metadata:
name: front
labels:
name: front
spec:
replicas: 2
selector:
matchLabels:
name: front
template:
metadata:
labels:
name: front
spec:
containers:
- name: frontend
image: {{ .Values.image }}
imagePullPolicy: Always
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: wrfront-{{ .Values.namespace }}-service
spec:
type: NodePort
ports:
- port: 80
targetPort: 80
name: default-port
protocol: TCP
selector:
name: front
---
Auth
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-wrauth-keys
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
---
apiVersion: "v1"
kind: "ConfigMap"
metadata:
name: "auth-config-ocpm"
labels:
app: "auth"
data:
ASPNETCORE_URL: "http://+:80"
ASPNETCORE_ENVIRONMENT: "Development"
ASPNETCORE_LOGGINGCONSOLEDISABLECOLORS: "true"
---
apiVersion: "apps/v1"
kind: "Deployment"
metadata:
name: "auth"
labels:
app: "auth"
spec:
replicas: 2
strategy:
type: Recreate
selector:
matchLabels:
app: "auth"
template:
metadata:
labels:
app: "auth"
spec:
volumes:
- name: auth-keys-storage
persistentVolumeClaim:
claimName: pvc-wrauth-keys
containers:
- name: "api-auth"
image: {{ .Values.image }}
imagePullPolicy: Always
ports:
- containerPort: 80
volumeMounts:
- name: auth-keys-storage
mountPath: "/app/auth-keys"
env:
- name: "ASPNETCORE_URL"
valueFrom:
configMapKeyRef:
key: "ASPNETCORE_URL"
name: "auth-config-ocpm"
- name: "ASPNETCORE_ENVIRONMENT"
valueFrom:
configMapKeyRef:
key: "ASPNETCORE_ENVIRONMENT"
name: "auth-config-ocpm"
- name: "ASPNETCORE_LOGGINGCONSOLEDISABLECOLORS"
valueFrom:
configMapKeyRef:
key: "ASPNETCORE_LOGGINGCONSOLEDISABLECOLORS"
name: "auth-config-ocpm"
---
apiVersion: v1
kind: Service
metadata:
name: auth-service
spec:
type: NodePort
selector:
app: auth
ports:
- name: default-port
protocol: TCP
port: 80
targetPort: 80
Backend (Service with problem)
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
labels:
app: backend
spec:
replicas: 2
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
containers:
- name: backend
image: {{ .Values.image }}
imagePullPolicy: Always
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: backend-service
spec:
type: NodePort
selector:
name: backend
ports:
- name: default-port
protocol: TCP
port: 80
targetPort: 80
Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
kubernetes.io/ingress.class: alb
# SSL Settings
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
alb.ingress.kubernetes.io/ssl-redirect: '443'
alb.ingress.kubernetes.io/certificate-arn: {{ .Values.certificate }}
spec:
rules:
- host: {{ .Values.host }}
http:
paths:
- path: /
backend:
service:
name: front-service
port:
name: default-port
pathType: Prefix
- host: back.{{ .Values.host }}
http:
paths:
- path: /
backend:
service:
name: backend-service
port:
name: default-port
pathType: Prefix
- host: auth.{{ .Values.host }}
http:
paths:
- path: /
backend:
service:
name: auth-service
port:
name: default-port
pathType: Prefix
I've tried to deploy other services and they work correctly, also running only backend or only another service, but always the same thing happens and always with the backend.
What could be happening? Is it a configuration problem? Some error in Ingress or Deployment? Or is it just the backend service?
I would be very grateful for any help.
domain.com
/ front-service:default-port (10.0.1.183:80,10.0.2.98:80)
back.domain.com
/ backend-service:default-port (\<none\>)
auth.domain.com
/ auth-service:default-port (10.0.1.30:80,10.0.1.33:80)
This one is saying that your backend service is not registered to the Ingress.
One thing to remember is that Ingress registers Services by pods' ClusterIP, like your Ingress output "10.0.1.30:80", not NodePort. And according to docs , I don't know why you can have multiple NodePort services with the same port. But when you do port-forward, you actually open that port on all your instances, I assume you have 2 instances, and then your ALB health check that port and return healthy.
But I think your issue is from your Ingress that can not locate your backend service.
My suggestions are:
Trying with only backend-service with port changed, and maybe without auth and frontend services. Default range of NodePort is 30000 - 32767
Going inside that pod or create a new pod, make a request to that service using its URL to check what it returns. By default, ALB only accept status 200 from its homepage.

GKE Ingress with NEGs: backend healthcheck doesn't pass

I have created GKE Ingress as follows:
apiVersion: cloud.google.com/v1beta1 #tried cloud.google.com/v1 as well
kind: BackendConfig
metadata:
name: backend-config
namespace: prod
spec:
healthCheck:
checkIntervalSec: 30
port: 8080
type: HTTP #case-sensitive
requestPath: /healthcheck
connectionDraining:
drainingTimeoutSec: 60
---
apiVersion: v1
kind: Service
metadata:
name: web-engine-service
namespace: prod
annotations:
cloud.google.com/neg: '{"ingress": true}' # Creates a NEG after an Ingress is created.
cloud.google.com/backend-config: '{"ports": {"web-engine-port":"backend-config"}}' #https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#associating_backendconfig_with_your_ingress
spec:
selector:
app: web-engine-pod
ports:
- name: web-engine-port
protocol: TCP
port: 8080
targetPort: 5000
---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
labels:
app: web-engine-deployment
environment: prod
name: web-engine-deployment
namespace: prod
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: web-engine-pod
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
name: web-engine-pod
labels:
app: web-engine-pod
environment: prod
spec:
containers:
- image: my-image:my-tag
imagePullPolicy: Always
name: web-engine-1
resources: {}
ports:
- name: flask-port
containerPort: 5000
protocol: TCP
readinessProbe:
httpGet:
path: /healthcheck
port: 5000
initialDelaySeconds: 30
periodSeconds: 100
restartPolicy: Always
terminationGracePeriodSeconds: 30
---
apiVersion: networking.gke.io/v1beta2
kind: ManagedCertificate
metadata:
name: my-certificate
namespace: prod
spec:
domains:
- api.mydomain.com #https://cloud.google.com/load-balancing/docs/ssl-certificates/google-managed-certs#renewal
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: prod-ingress
namespace: prod
annotations:
kubernetes.io/ingress.allow-http: "false"
kubernetes.io/ingress.global-static-ip-name: load-balancer-ip
networking.gke.io/managed-certificates: my-certificate
spec:
rules:
- http:
paths:
- path: /model
backend:
serviceName: web-engine-service
servicePort: 8080
I don't know what I'm doing wrong because my heathchecks are not Ok. And based on the perimeter logging I added to the app, nothing is even trying to hit that pod.
I've tried BackendConfig for both 8080 and 5000.
By the way, it's not 100% clear based on the docs if Load Balancer should be configured to targetPorts of corresponding Pods or Services.
The healthcheck is registered with HTTP Load Balancer and Compute Engine:
It seems that something is not right with the Backend service IP.
The corresponding backend service configuration:
$ gcloud compute backend-services describe k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
...
affinityCookieTtlSec: 0
backends:
- balancingMode: RATE
capacityScaler: 1.0
group: https://www.googleapis.com/compute/v1/projects/wnd/zones/europe-west3-a/networkEndpointGroups/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
maxRatePerEndpoint: 1.0
connectionDraining:
drainingTimeoutSec: 60
creationTimestamp: '2020-08-01T11:14:06.096-07:00'
description: '{"kubernetes.io/service-name":"prod/web-engine-service","kubernetes.io/service-port":"8080","x-features":["NEG"]}'
enableCDN: false
fingerprint: 5Vkqvg9lcRg=
healthChecks:
- https://www.googleapis.com/compute/v1/projects/wnd/global/healthChecks/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
id: '2233674285070159361'
kind: compute#backendService
loadBalancingScheme: EXTERNAL
logConfig:
enable: true
sampleRate: 1.0
name: k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
port: 80
portName: port0
protocol: HTTP
selfLink: https://www.googleapis.com/compute/v1/projects/wnd/global/backendServices/k8s1-85ef2f9a-prod-web-engine-service-8080-b938a707
sessionAffinity: NONE
timeoutSec: 30
(port 80 looks really suspicious but I thought maybe it's just left there as default and is not in use when NEGs are configured).
Figured it out. By default, even the latest GKE clusters are created with no IP Alias support. It's also called VPC-native. I didn't even bother to check that initially because:
NEGs are supported out of the box, and what's more they seem to be default with no need for explicit annotation when used on the GKE version I had(1.17.8-gke.17). It doesn't make sense to not enable IP Aliases by default then because it basically means that cluster is in a non-functional state by default.
I didn't check VPC-Native support initially, because this name for the feature is simply misleading. I had extensive prior experience with AWS and my faulty assumption was that VPC-Native is like EC2-VPC, as opposed to EC2-Classic, which is legacy.

RabbitMQ only shows one node

I have been trying to set up RabbitMQ on a k8s cluster, I finally got everything set up, but only one node shows up on the managementUI. Here are my steps:
1. Dockerfile Setup
I do this to enable autocluster:
FROM rabbitmq:3.8-rc-management-alpine
MAINTAINER kevlai
RUN rabbitmq-plugins --offline enable rabbitmq_peer_discovery_k8s
2. Set up RBAC
apiVersion: v1
kind: ServiceAccount
metadata:
name: borecast-rabbitmq
namespace: borecast-production
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: borecast-rabbitmq
namespace: borecast-production
rules:
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: borecast-rabbitmq
namespace: borecast-production
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: dev
subjects:
- kind: ServiceAccount
name: borecast-rabbitmq
namespace: borecast-production
3. Set up Secrets
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-secret
namespace: borecast-production
type: Opaque
data:
username: a2V2
password: Ym9yZWNhc3RydWx6
secretCookie: c2VjcmV0Y29va2llaGVyZQ==
4. Set up StorageClass
I'm setting up StorageClass so k8s will automatically do provision for me on AWS.
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: rabbitmq-sc
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
zone: us-east-2a
reclaimPolicy: Retain
5. Set up StatefulSets and Services
You can see there are two services. The headless service is for the pods themselves. As for the management service, I'll expose the service for an Ingress controller in order for it to be accessible from outside.
---
apiVersion: v1
kind: Service
metadata:
name: borecast-rabbitmq-management-service
namespace: borecast-production
labels:
app: borecast-rabbitmq
spec:
ports:
- port: 15672
targetPort: 15672
name: http
- port: 5672
targetPort: 5672
name: amqp
selector:
app: borecast-rabbitmq
---
apiVersion: v1
kind: Service
metadata:
name: borecast-rabbitmq-service
namespace: borecast-production
labels:
app: borecast-rabbitmq
spec:
clusterIP: None
ports:
- port: 5672
name: amqp
selector:
app: borecast-rabbitmq
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: borecast-rabbitmq
namespace: borecast-production
spec:
serviceName: borecast-rabbitmq-service
replicas: 3
template:
metadata:
labels:
app: borecast-rabbitmq
spec:
serviceAccountName: borecast-rabbitmq
containers:
- image: docker.borecast.com/borecast-rabbitmq:v1.0.3
name: borecast-rabbitmq
imagePullPolicy: Always
resources:
requests:
memory: "256Mi"
cpu: "150m"
limits:
memory: "512Mi"
cpu: "250m"
ports:
- containerPort: 5672
name: amqp
env:
- name: RABBITMQ_DEFAULT_USER
valueFrom:
secretKeyRef:
name: rabbitmq-secret
key: username
- name: RABBITMQ_DEFAULT_PASS
valueFrom:
secretKeyRef:
name: rabbitmq-secret
key: password
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbitmq-secret
key: secretCookie
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: K8S_SERVICE_NAME
# value: borecast-rabbitmq-service.borecast-production.svc.cluster.local
value: borecast-rabbitmq-service
- name: RABBITMQ_USE_LONGNAME
value: "true"
- name: RABBITMQ_NODENAME
value: "rabbit#$(MY_POD_NAME).$(K8S_SERVICE_NAME)"
# value: rabbit#$(MY_POD_NAME).borecast-rabbitmq-service.borecast-production.svc.cluster.local
- name: RABBITMQ_NODE_TYPE
value: disc
- name: AUTOCLUSTER_TYPE
value: "k8s"
- name: AUTOCLUSTER_DELAY
value: "10"
- name: AUTOCLUSTER_CLEANUP
value: "true"
- name: CLEANUP_WARN_ONLY
value: "false"
- name: K8S_ADDRESS_TYPE
value: "hostname"
- name: K8S_HOSTNAME_SUFFIX
value: ".$(K8S_SERVICE_NAME)"
# value: .borecast-rabbitmq-service.borecast-production.svc.cluster.local
volumeMounts:
- name: rabbitmq-volume
mountPath: /var/lib/rabbitmq
imagePullSecrets:
- name: regcred
volumeClaimTemplates:
- metadata:
name: rabbitmq-volume
namespace: borecast-production
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: rabbitmq-sc
resources:
requests:
storage: 5Gi
Problem
Everything is working. However, when I access the management UI (i.e. I'm access the borecast-rabbitmq-management-service, port 15672), I only see one node showing up, when it should be three:
Also notice that the cluster name is
rabbit#borecast-rabbitmq-0.borecast-rabbitmq-service.borecast-production.svc.cluster.local
but when I log out and log in again, sometimes the number 0 will be changed to 1 or 2 for borecast-rabbitmq-0.
And also notice the node name is
rabbit#borecast-rabbitmq-1.borecast-rabbitmq-service
And you guessed it, sometimes the number is 2 or 0 for borecast-rabbitmq-1.
I have been trying to debug but to no avail. The logs for each pod doesn't raise any suspicions and every service and statefulset are working normally. I repeated the five steps multiple times, and if your cluster is on AWS, you can totally replicate my setup by following the steps (after creating the namespace borecast-production of course). If anybody can shed some light on the matter, I'll be eternally grateful.
The problem is with the headless service name definition:
- name: K8S_SERVICE_NAME
# value: borecast-rabbitmq-service.borecast-production.svc.cluster.local
value: borecast-rabbitmq-service
which is a building block of node name:
- name: RABBITMQ_NODENAME
value: "rabbit#$(MY_POD_NAME).$(K8S_SERVICE_NAME)"
whereas the proper node name, should be of FQDN of the POD (<statefulset name>-<ordinal index>.<headless_svc_name>.<namespace>.svc.cluster.local):
- name: RABBITMQ_NODENAME
value: "rabbit#$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local"
Therefore you ended up with NodeName
borecast-rabbitmq-1.borecast-rabbitmq-service
instead of:
borecast-rabbitmq-1.borecast-rabbitmq-service.borecast-production.svc.cluster.local
Look up the fqdn of the pod created by borecast-rabbitmq StatefulSet (in other word: SRV records of the Pods) with nslookup util from inside of your cluster as explained here, to see what form the RABBITMQ_NODENAME is expected to have.
try exposing 4369 for headless service;
https://www.rabbitmq.com/clustering.html
see the port access section
Had the same issue, and it came down to
Deleting all the rabbitmq resources including the pvc created under the statefulset.
Then reinstalling everything from the manifests.

Kubernetes 1.4 SSL Termination on AWS

I have 6 HTTP micro-services. Currently they run in a crazy bash/custom deploy tools setup (dokku, mup).
I dockerized them and moved to kubernetes on AWS (setup with kop). The last piece is converting my nginx config.
I'd like
All 6 to have SSL termination (not in the docker image)
4 need websockets and client IP session affinity (Meteor, Socket.io)
5 need http->https forwarding
1 serves the same content on http and https
I did 1. SSL termination setting the service type to LoadBalancer and using AWS specific annotations. This created AWS load balancers, but this seems like a dead end for the other requirements.
I looked at Ingress, but don't see how to do it on AWS. Will this Ingress Controller work on AWS?
Do I need an nginx controller in each pod? This looked interesting, but I'm not sure how recent/relevant it is.
I'm not sure what direction to start in. What will work?
Mike
You should be able to use the nginx ingress controller to accomplish this.
SSL termination
Websocket support
http->https
Turn off the http->https redirect, as described in the link above
The README walks you through how to set it up, and there are plenty of examples.
The basic pieces you need to make this work are:
A default backend that will respond with 404 when there is no matching Ingress rule
The nginx ingress controller which will monitor your ingress rules and rewrite/reload nginx.conf whenever they change.
One or more ingress rules that describe how traffic should be routed to your services.
The end result is that you will have a single ELB that corresponds to your nginx ingress controller service, which in turn is responsible for routing to your individual services according to the ingress rules specified.
There may be a better way to do this. I wrote this answer because I asked the question. It's the best I could come up with Pixel Elephant's doc links above.
The default-http-backend is very useful for debugging. +1
Ingress
this creates an endpoint on the node's IP address, which can change depending on where the Ingress Container is running
note the configmap at the bottom. Configured per environment.
(markdown placeholder because no ```)
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: "nginx"
name: all-ingress
spec:
tls:
- hosts:
- admin-stage.example.io
secretName: tls-secret
rules:
- host: admin-stage.example.io
http:
paths:
- backend:
serviceName: admin
servicePort: http-port
path: /
---
apiVersion: v1
data:
enable-sticky-sessions: "true"
proxy-read-timeout: "7200"
proxy-send-imeout: "7200"
kind: ConfigMap
metadata:
name: nginx-load-balancer-conf
App Service and Deployment
the service port needs to be named, or you may get "upstream default-admin-80 does not have any active endpoints. Using default backend"
(markdown placeholder because no ```)
apiVersion: v1
kind: Service
metadata:
name: admin
spec:
ports:
- name: http-port
port: 80
protocol: TCP
targetPort: http-port
selector:
app: admin
sessionAffinity: ClientIP
type: ClusterIP
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: admin
spec:
replicas: 1
template:
metadata:
labels:
app: admin
name: admin
spec:
containers:
- image: example/admin:latest
name: admin
ports:
- containerPort: 80
name: http-port
resources:
requests:
cpu: 500m
memory: 1000Mi
volumeMounts:
- mountPath: /etc/env-volume
name: config
readOnly: true
imagePullSecrets:
- name: cloud.docker.com-pull
volumes:
- name: config
secret:
defaultMode: 420
items:
- key: admin.sh
mode: 256
path: env.sh
- key: settings.json
mode: 256
path: settings.json
secretName: env-secret
Ingress Nginx Docker Image
note default-ssl-certificate at bottom
logging is great -v below
note the Service will create an ELB on AWS which can be used to configure DNS.
(markdown placeholder because no ```)
apiVersion: v1
kind: Service
metadata:
name: nginx-ingress-service
spec:
ports:
- name: http-port
port: 80
protocol: TCP
targetPort: http-port
- name: https-port
port: 443
protocol: TCP
targetPort: https-port
selector:
app: nginx-ingress-service
sessionAffinity: None
type: LoadBalancer
---
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx-ingress-controller
labels:
k8s-app: nginx-ingress-lb
spec:
replicas: 1
selector:
k8s-app: nginx-ingress-lb
template:
metadata:
labels:
k8s-app: nginx-ingress-lb
name: nginx-ingress-lb
spec:
terminationGracePeriodSeconds: 60
containers:
- image: gcr.io/google_containers/nginx-ingress-controller:0.8.3
name: nginx-ingress-lb
imagePullPolicy: Always
readinessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
livenessProbe:
httpGet:
path: /healthz
port: 10254
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 1
# use downward API
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: http-port
containerPort: 80
hostPort: 80
- name: https-port
containerPort: 443
hostPort: 443
# we expose 18080 to access nginx stats in url /nginx-status
# this is optional
- containerPort: 18080
hostPort: 18080
args:
- /nginx-ingress-controller
- --default-backend-service=$(POD_NAMESPACE)/default-http-backend
- --default-ssl-certificate=default/tls-secret
- --nginx-configmap=$(POD_NAMESPACE)/nginx-load-balancer-conf
- --v=2
Default Backend (this is copy/paste from .yaml file)
apiVersion: v1
kind: Service
metadata:
name: default-http-backend
labels:
k8s-app: default-http-backend
spec:
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
selector:
k8s-app: default-http-backend
---
apiVersion: v1
kind: ReplicationController
metadata:
name: default-http-backend
spec:
replicas: 1
selector:
k8s-app: default-http-backend
template:
metadata:
labels:
k8s-app: default-http-backend
spec:
terminationGracePeriodSeconds: 60
containers:
- name: default-http-backend
# Any image is permissable as long as:
# 1. It serves a 404 page at /
# 2. It serves 200 on a /healthz endpoint
image: gcr.io/google_containers/defaultbackend:1.0
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 5
ports:
- containerPort: 8080
resources:
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
This config uses three secrets:
tls-secret - 3 files: tls.key, tls.crt, dhparam.pem
env-secret - 2 files: admin.sh and settings.json. Container has start script to setup environment.
cloud.docker.com-pull