Getting External-DNS to work with Ingress Objects in Kops AWS 1.7 K8s Cluster - amazon-web-services

I'm trying to figure out how to get this setup to work:
I am using Kube 1.7 (no RBAC) spun up from kops in AWS
I have a single nginx ingress controller for my entire cluster that is using a LoadBalancer service in the kube-system, namespace installed via Helm
I have cert-manager setup in kube-system, installed via Helm and
using ClusterIssuers
I have external-dns setup in kube-system installed via Helm
I have multiple applications, one per namespace, with associated Ingress objects in each namespace.
I am annotating the Ingresses with the appropriate annotations for both cert-manager (certmanager.k8s.io/cluster-issuer: letsencrypt-prod) and external-dns (dns.alpha.kubernetes.io/external: app.contoso.com)
In this scenario, cert-manager is reacting appropriately to the Ingress object (modifying it to complete the ACME challenge), but external-dns is not doing anything (logs are saying all hostnames are up to date). If I manually add a Route53 record for the ELB associated with the LB service, everything works as expected. Inspecting the Ingress object, I see that the status block looks like so:
status:
loadBalancer:
ingress:
- {}
which I suppose is why external-dns isn't reacting? How do I get this to work? Per the documentation
More troubleshooting information (pod definitions, ingress definitions, controller logs, etc.) can be found here: https://gist.github.com/DWSR/f6d596850346223393bec23b289c9731

I solved this myself. The nginx ingress controller has a --publish-service command line argument which will cause it to update the status fields on the ingress objects which, in turn, will cause external-dns to create the appropriate DNS records. When installing via Helm, simply set .Values.controller.publishService.enabled to true and this will take effect.
Sources:
https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/cli-arguments.md
https://github.com/kubernetes/charts/tree/master/stable/nginx-ingress#configuration

Related

Istio 504 on ALB to EKS node other than Istio gateway is located

Bug Description
I'm using EKS (1.23) and ALB. ALB is terminating TLS with certs provided by ACM.
Using terraform I installed in EKS cluster following helm charts:
istio-base
istiod
gateway
all 1.15.0 version.
Other things configured on cluster:
aws_security_group_rules, both ingress and egress, on EKS nodes for ports 15000-15090
required k8s namespaces
required k8s ingress configuring ALB via alb-controller
required ACM certificates for ALB
required Route53 DNS entries
All those things are quite common so I do not think there is any weird stuff there. I have it in multiple places configured that way without Istio.
I also added some httpbin Service and Deployment and related Gateway and VirtualService.
In ingress I have 2 paths configured (besides ssl-redirect directive for ALB):
/healthz/ready is pointing to status-port
and then / is pointing to http2
Ingress-gateway service is NodePort type, as required for this type of setup.
(Important) There is 2 nodes in the cluster.
AWS console Target Group details page shows that 2/2 targets are healthy.
Sooooooo ...
When I enter address https://httpbin.somedomain.com every second request gets 504 Gateway Timeout. When I enter https://httpbin.somedomain.com/healthz/ready I get 200 every time. When I increase amount of nodes in cluster to 3, 504 occurs for 2 out of 3 requests.
It's quite clear to me, that it's related to ALB round robin over machines ... but why? status-port is 200 always.
Version
$ istioctl version
client version: 1.15.0
control plane version: 1.15.0
data plane version: 1.15.0 (3 proxies)
$ kubectl version --short
Client Version: v1.23.2
Server Version: v1.23.7-eks-4721010
$ helm version --short
v3.8.0+gd141386
Additional Information
$ istioctl bug-report
Target cluster context: v2-xxx
Running with the following config:
istio-namespace: istio-system
full-secrets: false
timeout (mins): 30
include: { }
exclude: { Namespaces: kube-node-lease,kube-public,kube-system,local-path-storage }
end-time: 2022-09-27 17:29:26.34498 +0200 CEST
Cluster endpoint: https://yyy.yl4.eu-west-1.eks.amazonaws.com
CLI version:
version.BuildInfo{Version:"1.15.0", GitRevision:"e3364ab424b70ca8ee1ca76cb0b3afb73476aaac", GolangVersion:"go1.19", BuildStatus:"Clean", GitTag:"1.15.0"}
The following Istio control plane revisions/versions were found in the cluster:
Revision default:
&version.MeshInfo{
{
Component: "pilot",
Info: version.BuildInfo{Version:"1.15.0", GitRevision:"e3364ab424b70ca8ee1ca76cb0b3afb73476aaac", GolangVersion:"go1.19", BuildStatus:"Clean", GitTag:"1.15.0"},
},
}
The following proxy revisions/versions were found in the cluster:
Revision default: Versions {1.15.0}
Fetching proxy logs for the following containers:
argocd//argo-cd-argocd-application-controller-0/application-controller
argocd/argo-cd-argocd-applicationset-controller/argo-cd-argocd-applicationset-controller-9dddcffbf-zrcgl/applicationset-controller
argocd/argo-cd-argocd-dex-server/argo-cd-argocd-dex-server-75c975ccb7-xmd82/dex-server
argocd/argo-cd-argocd-notifications-controller/argo-cd-argocd-notifications-controller-5854964cbf-z8nlr/notifications-controller
argocd/argo-cd-argocd-redis/argo-cd-argocd-redis-664b98cfd7-lndsf/argo-cd-argocd-redis
argocd/argo-cd-argocd-repo-server/argo-cd-argocd-repo-server-75f49f7ccf-xsblh/repo-server
argocd/argo-cd-argocd-server/argo-cd-argocd-server-6599d8d846-dqr6s/server
first/httpbin/httpbin-7bffdcffd-2klzj/httpbin
first/httpbin/httpbin-7bffdcffd-2klzj/istio-proxy
...
istio-ingress-internal/internal/internal-554ddcb684-kr52c/istio-proxy
istio-ingress-internet-facing/internet-facing/internet-facing-555fd48d8d-2tx74/istio-proxy
istio-system/istiod/istiod-86cd5997bb-r6797/discovery
...
Fetching Istio control plane information from cluster.
Running istio analyze on all namespaces and report as below:
Analysis Report:
Info [IST0102] (Namespace argocd) The namespace is not enabled for Istio injection. Run 'kubectl label namespace argocd istio-injection=enabled' to enable it, or 'kubectl label namespace argocd istio-injection=disabled' to explicitly mark it as not needing injection.
Info [IST0102] (Namespace default) The namespace is not enabled for Istio injection. Run 'kubectl label namespace default istio-injection=enabled' to enable it, or 'kubectl label namespace default istio-injection=disabled' to explicitly mark it as not needing injection.
Info [IST0118] (Service argocd/argo-cd-argocd-applicationset-controller) Port name webhook (port: 7000, targetPort: webhook) doesn't follow the naming convention of Istio port.
...
Creating an archive at /Users/zzz/bug-report.tar.gz.
Cleaning up temporary files in /var/folders/l4/82mt4l7x4r5dzp1j4ppxqqzm0000gn/T/bug-report.
Done.
Original issue here
I solved it with allowing port 80 to be allowed between machines in EKS node group. I do not understand why does it help TBH.

Target health check fails - AWS Network Load Balancer

NOTE: I tried to include screenshots but stackoverflow does not allow me to add images with preview so I included them as links.
I deployed a web app on AWS using kOps.
I have two nodes and set up a Network Load Balancer.
The target group of the NLB has two nodes (each node is an instance made from the same template).
Load balancer actually seems to be working after checking ingress-nginx-controller logs.
The requests are being distributed over pods correctly. And I can access the service via ingress external address.
But when I go to AWS Console / Target Group, one of the two nodes is marked as and I am concerned with that.
Nodes are running correctly.
I tried to execute sh into nginx-controller and tried curl to both nodes with their internal IP address.
For the healthy node, I get nginx response and for the unhealthy node, it times out.
I do not know how nginx was installed on one of the nodes and not on the other one.
Could anybody let me know the possible reasons?
I had exactly the same problem before and this should be documented somewhere on AWS or Kubernetes. The answer is copied from AWS Premium Support
Short description
The NGINX Ingress Controller sets the spec.externalTrafficPolicy option to Local to preserve the client IP. Also, requests aren't routed to unhealthy worker nodes. The following troubleshooting implies that you don't need to maintain the cluster IP address or preserve the client IP address.
Resolution
If you check the ingress controller service you will see the External Traffic Policy field set to Local.
$ kubectl -n ingress-nginx describe svc ingress-nginx-controller
Output:
Name: ingress-nginx-controller
Namespace: ingress-nginx
...
External Traffic Policy: Local
...
This Local setting drops packets that are sent to Kubernetes nodes that aren't running instances of the NGINX Ingress Controller. Assign NGINX pods (from the Kubernetes website) to the nodes that you want to schedule the NGINX Ingress Controller on.
Update the pec.externalTrafficPolicy option to Cluster
$ kubectl -n ingress-nginx patch service ingress-nginx-controller -p '{"spec":{"externalTrafficPolicy":"Cluster"}}'
Output:
service/ingress-nginx-controller patched
By default, NodePort services perform source address translation (from the Kubernetes website). For NGINX, this means that the source IP of an HTTP request is always the IP address of the Kubernetes node that received the request. If you set a NodePort to the value of the externalTrafficPolicy field in the ingress-nginx service specification to Cluster, then you can't maintain the source IP address.

Using existing load balancer for K8S service

I have a simple app that I need to deploy in K8S (running on AWS EKS) and expose it to the outside world.
I know that I can add a service with the type LoadBalancer and viola K8S will create AWS ALB for me.
spec:
type: LoadBalancer
However, the issue is that it will create a new LB.
The main reason why this is an issue for me is that I am trying to separate out infrastructure creation/upgrades (vs. software deployment/upgrade). All of my infrastructures will be managed by Terraform and all of my software will be defined via K8S YAML files (may be Helm in the future).
And the creation of a load balancer (infrastructure) breaks this model.
Two questions:
Do I understand correctly that you can't change this behavior (create vs. use existing)?
I read multiple articles about K8S and all of them lead me into the direction of Ingress + Ingress Controller. Is this the way to solve this problem?
I am hesitant to go in this direction. There are tons of steps to get it working and it will take time for me to figure out how to retrofit it in Terraform and k8s YAML files
Short Answer , you can only change it to "NodePort" and couple the existing LB manually by adding EKS nodes with the right exposed port.
like
spec:
type: NodePort
externalTrafficPolicy: Cluster
ports:
- name: http
port: 80
protocol: TCP
targetPort: http
nodePort: **30080**
But to attach it like a native, that is not supported by AWS k8s Controller yet and may not be a priority to do support such behavior as :
Configuration: Controllers get configuration from k8s config maps or special CustomResourceDefinitions(CRDs) that will conflict with any manual
config on the already existing LB and my lead to wiping existing configs as not tracked in configs source.
Q: Direct expose or overlay ingress :
Note: Use ingress ( Nginx or AWS ALB ) if you have (+1) services to expose or you need to add controls on exposed APIs.

kubernetes LoadBalancer service

Trying to teach myself on how to use Kubernetes, and having some issues.
I was able to set up a cluster, deploy the nginx image and then access nginx using a service of type NodePort (once I added the port to the security group inbound rules of the node).
My next step was to try to use a service of type LoadBalancer to try to access nginx.
I set up a new cluster and deployed the nginx image.
kubectl \
create deployment my-nginx-deployment \
--image=nginx
I then set up the service for the LoadBalancer
kubectl expose deployment my-nginx-deployment --type=LoadBalancer --port=80 --target-port=8080 --name=nginxpubic
Once it was done setting up, I tried to access nginx using the LoadBalancer Ingress (Which I found from describing the LoadBalancer service). I received a This page isn’t working error.
Not really sure where I went wrong.
results of kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 100.64.0.1 <none> 443/TCP 7h
nginxpubic LoadBalancer 100.71.37.139 a5396ba70d45d11e88f290658e70719d-1485253166.us-west-2.elb.amazonaws.com 80:31402/TCP 7h
From the nginx dockerhub page , I see that the container is using port 80.
https://hub.docker.com/_/nginx/
It should be like this:
kubectl expose deployment my-nginx-deployment --type=LoadBalancer --port=80 --target-port=80 --name=nginxpubic
Also,
make sure the service type loadbalancer is available in your environement.
Known Issues for minikube installation
Features that require a Cloud Provider will not work in Minikube. These include:
LoadBalancers
Features that require multiple nodes. These include:
Advanced scheduling policies

How does kubernetes select nodes to add to the load balancers on AWS?

Some info:
Kubernetes (1.5.1)
AWS
1 master and 1 node (both ubuntu 16.04)
k8s installed via kubeadm
Terraform made by me
Please don't reply use kube-up, kops or similar. This is about understanding how k8s works under the hood. There is by far too much unexplained magic in the system and I want to understand it.
== Question:
When creating a Service of type load balancer on k8s[aws] (for example):
apiVersion: v1
kind: Service
metadata:
name: kubernetes-dashboard
namespace: kube-system
labels:
k8s-addon: kubernetes-dashboard.addons.k8s.io
k8s-app: kubernetes-dashboard
kubernetes.io/cluster-service: "true"
facing: external
spec:
type: LoadBalancer
selector:
k8s-app: kubernetes-dashboard
ports:
- port: 80
I successfully create an internal or external facing ELB but none of the machines are added to the ELB (I can taint the master too but nothing changes). My problem is basically this:
https://github.com/kubernetes/kubernetes/issues/29298#issuecomment-260659722
The subnets and nodes (but not the VPC) are all tagged with "KubernetesCluster" (again... elb are created in the right place). However no nodes is added.
In the logs
kubectl logs kube-controller-manager-ip-x-x-x-x -n kube-system
after:
aws_loadbalancer.go:63] Creating load balancer for
kube-system/kubernetes-dashboard with name:
acd8acca0c7a111e69ca306f22de69ae
There is no other output (it should print the nodes added or removed). I tried to understand the code at:
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws_loadbalancer.go
But whatever is the reason, this function to not add nodes.
The documentation doesn't go at length trying to explain the "process" behind k8s decisions. To try to understand k8s I tried/used kops, kube up, kubeadm, kubernetes the hard way repo and reading damn code, but still I am unable to understand how k8s on aws SELECTS the node to add to the elb.
As a consequence, also no security group is changed anywhere.
Is it a tag on the ec2?
Kublet setting?
Anything else?
Any help is greatly appreciated.
Thanks,
F.
I think Steve is on the right track. Make sure your kubelets, apiserver, and controller-manager components all include --cloud-provider=aws in their arguments lists.
You mention your subnets and instances all have matching KubernetesCluster tags. Do your controller & worker security groups? K8s will modify the worker SG in particular to allow traffic to/from the service ELBs it creates. I tag my VPC as well, though I guess it's not required and may prohibit another cluster from living in the same VPC.
I also tag my private subnets with kubernetes.io/role/internal-elb=true and public ones with kubernetes.io/role/elb=true to identify where internal and public ELBs can be created.
The full list (AFAIK) of tags and annotations lives in https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go
I think the node registration is being managed outside of Kubernetes. I'm using kops and if I edit the size of my ASG in AWS the new nodes are not registered with my service ELBs. But if I edit the number of nodes using kops the new nodes are there.
In the docs a kops instance group maps to an ASG when running on AWS. In the code it looks like its calling AWS rather than a k8s API.
I know you're not using kops but I think in Terraform you need to replicate the AWS API calls that kops is making.
Make sure you are setting the correct cloud provider settings with kubeadm (http://kubernetes.io/docs/admin/kubeadm/).
The AWS cloud provider automatically syncs the nodes available with the ELB. I created an type LoadBalancer then scaled my cluster and the new node was eventually added the ELB: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws_loadbalancer.go#L376