AWS EKS - Failure creating load balancer controller - amazon-web-services

I am trying to create an application load balancer controller on my EKS cluster by following
this link
When I run these steps (after making the necessary changes to the downloaded yaml file)
curl -o v2_1_2_full.yaml https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.1.2/docs/install/v2_1_2_full.yaml
kubectl apply -f v2_1_2_full.yaml
I get this output
customresourcedefinition.apiextensions.k8s.io/targetgroupbindings.elbv2.k8s.aws configured
mutatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook configured
role.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-role unchanged
clusterrole.rbac.authorization.k8s.io/aws-load-balancer-controller-role configured
rolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-leader-election-rolebinding unchanged
clusterrolebinding.rbac.authorization.k8s.io/aws-load-balancer-controller-rolebinding unchanged
service/aws-load-balancer-webhook-service unchanged
deployment.apps/aws-load-balancer-controller unchanged
validatingwebhookconfiguration.admissionregistration.k8s.io/aws-load-balancer-webhook configured
Error from server (InternalError): error when creating "v2_1_2_full.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s: no endpoints available for service "cert-manager-webhook"
Error from server (InternalError): error when creating "v2_1_2_full.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s: no endpoints available for service "cert-manager-webhook"
The load balancer controller doesnt appear to start up because of this and never gets to the ready state
Has anyone any suggestions on how to resolve this issue?

Turns out the taints on my nodegroup prevented the cert-manager pods from starting on any node.
These commands helped debug and led me to a fix for this issue:
kubectl get po -n cert-manager
kubectl describe po <pod id> -n cert-manager
My solution was to create another nodeGroup with no taints specified. This allowed the cert-manager to run.

Related

Istio Ingress not showing address (Kubeflow on AWS)

I'm trying to setup kubeflow on AWS, I did follow this tutorial to setup kubeflow on AWS.
I used dex instead of cognito with following policy.
then at step: kfctl apply -V -f kfctl_aws.yaml , first I received this error:
IAM for Service Account is not supported on non-EKS cluster
So to fix this I set the property enablePodIamPolicy: false
Then retried and it successfully deployed kubeflow, on checking services status using kubectl -n kubeflow get all, I found all services ready except MPI operator.
ignoring this when I tried to run kubectl get ingress -n istio-system
I got the following result.
upon investigation using kubectl -n kubeflow logs $(kubectl get pods -n kubeflow --selector=app=aws-alb-ingress-controller --output=jsonpath={.items..metadata.name})
I found the following error:
E1104 12:09:37.446342 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to reconcile LB managed SecurityGroup: failed to reconcile managed LoadBalancer securityGroup: UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: Lsvzm7f4rthL4Wxn6O8wiQL1iYXQUES_9Az_231BV7fyjgs7CHrwgUOVTNTf4334_C4voUogjSuCoF8GTOKhc5A7zAFzvcGUKT_FBs6if06KMQCLiCoujgfoqKJbG75pPsHHDFARIAdxNYZeIr4klmaUaxbQiFFxpvQsfT4ZkLMD7jmuQQcrEIw_U0MlpCQGkcvC69NRVVKjynIifxPBySubw_O81zifDp0Dk8ciRysaN1SbF85i8V3LoUkrtwROhUI9aQYJgYgSJ1CzWpfNLplbbr0X7YIrTDKb9sMhmlVicj_Yng0qFka_OVmBjHTnpojbKUSN96uBjGYZqC2VQXM1svLAHDTU1yRruFt5myqjhJ0fVh8Imhsk1Iqh0ytoO6eFoiLTWK4_Crb8XPS5tptBBzpEtgwgyk4QwOmzySUwkvNdDB-EIsTJcg5RQJl8ds4STNwqYV7XXeWxYQsmL1vGPVFY2lh_MX6q1jA9n8smxITE7F6AXsuRHTMP5q0jk58lbrUe-ZvuaD1b0kUTvpO3JtwWwxRd7jTKF7xde2InNOXwXxYCxHOw0sMX56Y1wLkvEDTLrNLZWOACS-T5o7mXDip43U0sSoUtMccu7lpfQzH3c7lNdr9s2Wgz4OqYaQYWsxNxRlRBdR11TRMweZt4Ta6K-7si5Z-rrcGmjG44NodT0O14Gzj-S4i6bK-qPYvUEsVeUl51ev_MsnBKtCXcMF8W6j9D7Oe3iGj13uvlVJEtq3OIoRjBXIuQQ012H0b3nQqlkoKEvsPAA_txAjgHXVzEVcM301_NDQikujTHdnxHNdzMcCfY7DQeeOE_2FT_hxYGlbuIg5vonRTT7MfSP8_LUuoIICGS81O-hDXvCLoomltb1fqCBBU2jpjIvNALMwNdJmMnwQOcIMI_QonRKoe5W43v\n\tstatus code: 403, request id: a9be63bd-2a3a-4a21-bb87-93532923ffd2" "controller"="alb-ingress-controller" "request"={"Namespace":"istio-system","Name":"istio-ingress"}
I don't understand what exactly went wrong in security permissions ?
The alb-ingress-controller doesn't have permission to create an ALB.
By setting the enablePodIamPolicy: false, I assume you go for option 2 of the guide.
The alb-ingress-controller uses the kf-admin role, and the installer needs attach on that role a policy found in aws-config/iam-alb-ingress-policy.json. Most probably it's not installed, so you'll have to add it in IAM and attach it to the role.
After doing that, observe the reconciler logs of the alb-ingress-controller to see if it's able to create the ALB.
It's likely the cluster-name in the aws-alb-ingress-controller-config is not correctly configured.
If that's the case, you should edit the Config Map to the right cluster name using kubectl edit cm aws-alb-ingress-controller-config -n kubeflow.
After that you should delete the pod so it restarts (kubectl -n kubeflow delete pod $(kubectl get pods -n kubeflow --selector=app=aws-alb-ingress-controller --output=jsonpath={.items..metadata.name})).

Error: action failed after 10 attempts: failed to connect to the management cluster. Get https://127.0.0.1:43343/api?timeout=30s: EOF

I am creating an EKS-Anywhere local cluster by following these steps: Create local cluster | EKS Anywhere
Getting the following error after executing this command.
eksctl anywhere create cluster -f $CLUSTER_NAME.yaml
Performing setup and validations
Warning: The docker infrastructure provider is meant for local development and testing only
✅ Docker Provider setup is valid
Creating new bootstrap cluster
Installing cluster-api providers on bootstrap cluster
Provider specific setup
Creating new workload cluster
Installing networking on workload cluster
Installing storage class on workload cluster
Installing cluster-api providers on workload cluster
Moving cluster management from bootstrap to workload cluster
Error: failed to create cluster: error moving CAPI management from source to target: failed moving management cluster: Performing move...
Discovering Cluster API objects
Moving Cluster API objects Clusters=1
Creating objects in the target cluster
Deleting objects from the source cluster
Error: action failed after 10 attempts: failed to connect to the management cluster: action failed after 9 attempts: Get https://127.0.0.1:43343/api?timeout=30s: EOF
Upgrade your cert-manager
There is known issue: clusterctl init fails when existing cert-manager runs 1.0+ · Issue #3836 · kubernetes-sigs/cluster-api
And there is a solution: ⚠️ Upgrade cert-manager to v1.1.0 by fabriziopandini · Pull Request #4013 · kubernetes-sigs/cluster-api
And it works:
Cluster API is using cert-manager v1.1.0 now, so this should not be a problem anymore
So, I'd suggest upgrading.
It could be a resource constraint on your docker deployment. How much RAM and disk is Docker configured with. I have something like 16gb RAM and 60 gig disk which is more than required, but it does work.

Kubernetes pod Back-off restarting failed container with CrashLoopBack error

Github repo
After I configured the kubectl with the AWS EKS cluster, I deployed the services using these commands:
kubectl apply -f env-configmap.yaml
kubectl apply -f env-secret.yaml
kubectl apply -f aws-secret.yaml
# this is repeated for all services
kubectl apply -f svcname-deploymant.yaml
kubectl apply -f svcname-service.yaml
The other services ran successfully but the reverse proxy returned an error and when I investigated by running the command kubectl describe pod reverseproxy...
I got this info:
https://pastebin.com/GaREMuyj
[Edited]
After running the command kubectl logs -f reverseproxy-667b78569b-qg7p I get this:
As David Maze very rightly pointed out, your problem is not reproducible. You haven't provided all the configuration files, for example. However, the error you received clearly tells about the problem:
host not found in upstream "udagram-users: 8080" in /etc/nginx/nginx.conf:11
This error makes it clear that you are trying to connect to host udagram-users: 8080 as defined in file /etc/nginx/nginx.conf on line 11.
And how can I solve it please?
You need to check the connection. (It is also possible that you entered the wrong hostname or port in the config). You mentioned that you are using multiple subnets:
it is using 5 subnets
In such a situation, it is very likely that there is no connection because the individual components operate on different networks and will never be able to communicate with each other. If you run all your containers on one network, it should work. If, on the other hand, you want to use multiple subnets, you need to ensure container-to-container communication across multiple subnets.
See also this similar problem with many possible solutions.

kubectl Error: You must be logged in to the server

I've checked almost all of the answers on here, but nothing has resolved this yet.
When running kubectl, I will consistently get error: You must be logged in to the server (Unauthorized).
I have tried editing the config file via kubectl config --kubeconfig=config view, but I still receive the same error, even when running kubectl edit -n kube-system configmap/aws-auth.
Even when I just try to analyze my clusters and run aws eks list-clusters, I receive a different error An error occurred (UnrecognizedClientException) when calling the ListClusters operation: The security token included in the request is invalid.
I have completely torn down my clusters on EKS and rebuilding them, but I keep encountering these same errors. This is my first time attempting to use AWS EKS, and I've been trying different things for a few days.
I've set my aws configure
λ aws configure
AWS Access Key ID [****************Q]: *****
AWS Secret Access Key [****************5]: *****
Default region name [us-west-2]: us-west-2
Default output format [json]: json
Even when trying to look at the config map, I receive the same error:
λ kubectl describe configmap -n kube-system aws-auth
error: You must be logged in to the server (Unauthorized)
For me the problem was because of the system time, below solved the issue for me.
sudo apt install ntp
service ntp restart

Google container Engine django deployment is raising error

I have set up my Django project to deploy on the container engine based on documentation https://cloud.google.com/python/django/container-engine.
After creating kubernetes resources with
kubectl create -f project.yaml
I try to get the status of the pods with
kubectl get pods
Each of the pod has status **CrashLoopBackOff**
Can you please suggest on debugging this error?