Unable to deploy aws-load-balancer-controller on Kubernetes - amazon-web-services

I am trying to deploy the aws-load-balancer-controller on my Kubernetes cluster on AWS = by following the steps given in https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html
After the yaml file is applied and while trying to check the status of the deployment , I get :
$ kubectl get deployment -n kube-system aws-load-balancer-controller
NAME READY UP-TO-DATE AVAILABLE AGE
aws-load-balancer-controller 0/1 1 0 6m39s
I tried to debug it and I got this :
$ kubectl logs -n kube-system deployment.apps/aws-load-balancer-controller
{"level":"info","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"error","logger":"setup","msg":"unable to create controller","controller":"Ingress","error":"the server could not find the requested resource"}
The yaml file is pulled directly from https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases/download/v2.3.0/v2_3_0_full.yaml and apart from changing the Kubernetes cluster name, no other modifications are done.
Please let me know if I am missing some step in the configuration.
Any help would be highly appreciated.

I am not sure if this helps, but for me the issue was that the version of the aws-load-balancer-controller was not compatible with the version of Kubernetes.
aws-load-balancer-controller = v2.3.1
Kubernetes/EKS = 1.22
Github issue for more information:
https://github.com/kubernetes-sigs/aws-load-balancer-controller/issues/2495

Related

kubectl wait for Service on AWS EKS to expose Elastic Load Balancer (ELB) address reported in .status.loadBalancer.ingress field

As the kubernetes.io docs state about a Service of type LoadBalancer:
On cloud providers which support external load balancers, setting the
type field to LoadBalancer provisions a load balancer for your
Service. The actual creation of the load balancer happens
asynchronously, and information about the provisioned balancer is
published in the Service's .status.loadBalancer field.
On AWS Elastic Kubernetes Service (EKS) a an AWS Load Balancer is provisioned that load balances network traffic (see AWS docs & the example project on GitHub provisioning a EKS cluster with Pulumi). Assuming we have a Deployment ready with the selector app=tekton-dashboard (it's the default Tekton dashboard you can deploy as stated in the docs), a Service of type LoadBalancer defined in tekton-dashboard-service.yml could look like this:
apiVersion: v1
kind: Service
metadata:
name: tekton-dashboard-external-svc-manual
spec:
selector:
app: tekton-dashboard
ports:
- protocol: TCP
port: 80
targetPort: 9097
type: LoadBalancer
If we create the Service in our cluster with kubectl apply -f tekton-dashboard-service.yml -n tekton-pipelines, the AWS ELB get's created automatically:
There's only one problem: The .status.loadBalancer field is populated with the ingress[0].hostname field asynchronously and is therefore not available immediately. We can check this, if we run the following commands together:
kubectl apply -f tekton-dashboard-service.yml -n tekton-pipelines && \
kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}'
The output will be an empty field:
{}%
So if we want to run this setup in a CI pipeline for example (e.g. GitHub Actions, see the example project's workflow provision.yml), we need to somehow wait until the .status.loadBalancer field got populated with the AWS ELB's hostname. How can we achieve this using kubectl wait?
TLDR;
Prior to Kubernetes v1.23 it's not possible using kubectl wait, but using until together with grep like this:
until kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done
or even enhance the command using timeout (brew install coreutils on a Mac) to prevent the command from running infinitely:
timeout 10s bash -c 'until kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done'
Problem with kubectl wait & the solution explained in detail
As stated in this so Q&A and the kubernetes issues kubectl wait unable to not wait for service ready #80828 & kubectl wait on arbitrary jsonpath #83094 using kubectl wait for this isn't possible in current Kubernetes versions right now.
The main reason is, that kubectl wait assumes that the status field of a Kubernetes resource queried with kubectl get service/xyz --output=yaml contains a conditions list. Which a Service doesn't have. Using jsonpath here would be a solution and will be possible from Kubernetes v1.23 on (see this merged PR). But until this version is broadly available in managed Kubernetes clusters like EKS, we need another solution. And it should also be available as "one-liner" just as a kubectl wait would be.
A good starting point could be this superuser answer about "watching" the output of a command until a particular string is observed and then exit:
until my_cmd | grep "String Im Looking For"; do : ; done
If we use this approach together with a kubectl get we can craft a command which will wait until the field ingress gets populated into the status.loadBalancer field in our Service:
until kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done
This will wait until the ingress field got populated and then print out the AWS ELB address (e.g. via using kubectl get service tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer.ingress[0].hostname}' thereafter):
$ until kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done
{"ingress":[{"hostname":"a74b078064c7d4ba1b89bf4e92586af0-18561896.eu-central-1.elb.amazonaws.com"}]}
Now we have a one-liner command that behaves just like a kubectl wait for our Service to become available through the AWS Loadbalancer. We can double check if this is working with the following commands combined (be sure to delete the Service using kubectl delete service/tekton-dashboard-external-svc-manual -n tekton-pipelines before you execute it, because otherwise the Service incl. the AWS LoadBalancer already exists):
kubectl apply -f tekton-dashboard-service.yml -n tekton-pipelines && \
until kubectl get service/tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer}' | grep "ingress"; do : ; done && \
kubectl get service tekton-dashboard-external-svc-manual -n tekton-pipelines --output=jsonpath='{.status.loadBalancer.ingress[0].hostname}'
Here's also a full GitHub Actions pipeline run if you're interested.

Istio Ingress not showing address (Kubeflow on AWS)

I'm trying to setup kubeflow on AWS, I did follow this tutorial to setup kubeflow on AWS.
I used dex instead of cognito with following policy.
then at step: kfctl apply -V -f kfctl_aws.yaml , first I received this error:
IAM for Service Account is not supported on non-EKS cluster
So to fix this I set the property enablePodIamPolicy: false
Then retried and it successfully deployed kubeflow, on checking services status using kubectl -n kubeflow get all, I found all services ready except MPI operator.
ignoring this when I tried to run kubectl get ingress -n istio-system
I got the following result.
upon investigation using kubectl -n kubeflow logs $(kubectl get pods -n kubeflow --selector=app=aws-alb-ingress-controller --output=jsonpath={.items..metadata.name})
I found the following error:
E1104 12:09:37.446342 1 controller.go:217] kubebuilder/controller "msg"="Reconciler error" "error"="failed to reconcile LB managed SecurityGroup: failed to reconcile managed LoadBalancer securityGroup: UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: Lsvzm7f4rthL4Wxn6O8wiQL1iYXQUES_9Az_231BV7fyjgs7CHrwgUOVTNTf4334_C4voUogjSuCoF8GTOKhc5A7zAFzvcGUKT_FBs6if06KMQCLiCoujgfoqKJbG75pPsHHDFARIAdxNYZeIr4klmaUaxbQiFFxpvQsfT4ZkLMD7jmuQQcrEIw_U0MlpCQGkcvC69NRVVKjynIifxPBySubw_O81zifDp0Dk8ciRysaN1SbF85i8V3LoUkrtwROhUI9aQYJgYgSJ1CzWpfNLplbbr0X7YIrTDKb9sMhmlVicj_Yng0qFka_OVmBjHTnpojbKUSN96uBjGYZqC2VQXM1svLAHDTU1yRruFt5myqjhJ0fVh8Imhsk1Iqh0ytoO6eFoiLTWK4_Crb8XPS5tptBBzpEtgwgyk4QwOmzySUwkvNdDB-EIsTJcg5RQJl8ds4STNwqYV7XXeWxYQsmL1vGPVFY2lh_MX6q1jA9n8smxITE7F6AXsuRHTMP5q0jk58lbrUe-ZvuaD1b0kUTvpO3JtwWwxRd7jTKF7xde2InNOXwXxYCxHOw0sMX56Y1wLkvEDTLrNLZWOACS-T5o7mXDip43U0sSoUtMccu7lpfQzH3c7lNdr9s2Wgz4OqYaQYWsxNxRlRBdR11TRMweZt4Ta6K-7si5Z-rrcGmjG44NodT0O14Gzj-S4i6bK-qPYvUEsVeUl51ev_MsnBKtCXcMF8W6j9D7Oe3iGj13uvlVJEtq3OIoRjBXIuQQ012H0b3nQqlkoKEvsPAA_txAjgHXVzEVcM301_NDQikujTHdnxHNdzMcCfY7DQeeOE_2FT_hxYGlbuIg5vonRTT7MfSP8_LUuoIICGS81O-hDXvCLoomltb1fqCBBU2jpjIvNALMwNdJmMnwQOcIMI_QonRKoe5W43v\n\tstatus code: 403, request id: a9be63bd-2a3a-4a21-bb87-93532923ffd2" "controller"="alb-ingress-controller" "request"={"Namespace":"istio-system","Name":"istio-ingress"}
I don't understand what exactly went wrong in security permissions ?
The alb-ingress-controller doesn't have permission to create an ALB.
By setting the enablePodIamPolicy: false, I assume you go for option 2 of the guide.
The alb-ingress-controller uses the kf-admin role, and the installer needs attach on that role a policy found in aws-config/iam-alb-ingress-policy.json. Most probably it's not installed, so you'll have to add it in IAM and attach it to the role.
After doing that, observe the reconciler logs of the alb-ingress-controller to see if it's able to create the ALB.
It's likely the cluster-name in the aws-alb-ingress-controller-config is not correctly configured.
If that's the case, you should edit the Config Map to the right cluster name using kubectl edit cm aws-alb-ingress-controller-config -n kubeflow.
After that you should delete the pod so it restarts (kubectl -n kubeflow delete pod $(kubectl get pods -n kubeflow --selector=app=aws-alb-ingress-controller --output=jsonpath={.items..metadata.name})).

Kubectl get deployments shows No resources found in default namespace

I am trying my hands on Kubernetes and I tried to deploy an image into k8s service
root#KubernetesMiniKube:/usr/local/bin# kubectl run hello-minikube --image=k8s.gcr.io/echoserver:1.10 --port=8080
pod/hello-minikube created
root#KubernetesMiniKube:/usr/local/bin# kubectl get pod
NAME READY STATUS RESTARTS AGE
hello-minikube 1/1 Running 0 16s
root#KubernetesMiniKube:/usr/local/bin# kubectl get deployments
No resources found in default namespace.
Why i am seeing No resource found but actually there is a resource running inside default namespace.
When you are using $ kubectl run it will create a pod.
In your example thats exactly what happned, it created pod, named hello-minikube.
pod/hello-minikube created
If you want to create deployment
Deployments represent a set of multiple, identical Pods with no unique identities. A Deployment runs multiple replicas of your application and automatically replaces any instances that fail or become unresponsive.
you can do it using command:
$ kubectl create deployment hello-minikube --image=k8s.gcr.io/echoserver:1.10 --port=8080
deployment.apps/hello-minikube created
user#cloudshell:$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
hello-minikube 1/1 1 1 8s
You can also create deployment using YAML.
Save YAML from this documentation example and use kubectl apply.
$ vi nginx.yaml
<paste proper YAML definition. Also you can use nano editor, or download ready yaml>
user#cloudshell:$ kubectl apply -f nginx.yaml
deployment.apps/nginx-deployment created
$ kubectl get deployments
NAME READY UP-TO-DATE AVAILABLE AGE
hello-minikube 1/1 1 1 3m48s
nginx-deployment 3/3 3 3 64s
Please let me know if you have further questions regarding this answer.

Kubectl command throwing error: Unable to connect to the server: getting credentials: exec: exit status 2

I am doing a lab setup of EKS/Kubectl and after the completion cluster build, I run the following:
> kubectl get node
And I get the following error:
Unable to connect to the server: getting credentials: exec: exit status 2
Moreover, I am sure it is a configuration issue for,
kubectl version
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:
aws help
aws <command> help
aws <command> <subcommand> help
aws: error: argument operation: Invalid choice, valid choices are:
create-cluster | delete-cluster
describe-cluster | describe-update
list-clusters | list-updates
update-cluster-config | update-cluster-version
update-kubeconfig | wait
help
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T21:04:32Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"darwin/amd64"}
Unable to connect to the server: getting credentials: exec: exit status 2
Please advise next steps for troubleshooting.
Please delete the cache folder folder present in
~/.aws/cli/cache
For me running kubectl get nodes or kubectl cluster-info gives me the following error.
Unable to connect to the server: getting credentials: exec: executable kubelogin not found
It looks like you are trying to use a client-go credential plugin that is not installed.
To learn more about this feature, consult the documentation available at:
https://kubernetes.io/docs/reference/access-authn-authz/authentication/#client-go-credential-plugins
I did the following to resolve this.
Deleted all of the contents inside ~/.kube/. In my case, its a windows machine, so its C:\Users\nis.kube. Here nis is the user name that I logged into.
Ran the get credentials command as follows.
az aks get-credentials --resource-group terraform-aks-dev --name terraform-aks-dev-aks-cluster --admin
Note --admin in the end. Without it, its giving me the same error.
Now the above two commands are working.
Reference: https://blog.baeke.info/2021/06/03/a-quick-look-at-azure-kubelogin/
Did you have the kubectl configuration file ready?
Normally we put it under ~/.kube/config and the file includes the cluster endpoint, ceritifcate, contexts, admin users, and so on.
Furtherly, read this document: https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html
In my case, as I am using azure (not aws), I had to install "kubelogin" which resolved the issue.
"kubelogin" is a client-go credential (exec) plugin implementing azure authentication. This plugin provides features that are not available in kubectl. It is supported on kubectl v1.11+
Can you check your ~/.kube/config file?
Assume if you have start local cluster using minikube for that if your config is available, you should not be getting the error for server.
Sample config file
apiVersion: v1
clusters:
- cluster:
certificate-authority: /Users/singhvi/.minikube/ca.crt
server: https://127.0.0.1:32772
name: minikube
contexts:
- context:
cluster: minikube
user: minikube
name: minikube
current-context: minikube
kind: Config
preferences: {}
users:
- name: minikube
user:
client-certificate: /Users/singhvi/.minikube/profiles/minikube/client.crt
client-key: /Users/singhvi/.minikube/profiles/minikube/client.key
You need to update/recreate your local kubeconfig. In my case I deleted the whole ~/.kube/config and followed this tutorial:
https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html
Make sure you have installed AWS CLI.
https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
I had the same problem, the issue was that in my .aws/credentials file there was multiple users, and the user that had the permissions on the cluster of EKS (admin_test) wasn't the default user. So in my case, i made the "admin_test" user as my default user in the CLI using environment variables:
export $AWS_PROFILE='admin_test'
After that, i checked the default user with the command:
aws sts get-caller-identity
Finally, i was able to get the nodes with the kubectl get nodes command.
Reference: https://docs.aws.amazon.com/eks/latest/userguide/create-kubeconfig.html
In EKS you can retrieve your kubectl credentials using the following command:
% aws eks update-kubeconfig --name cluster_name
Updated context arn:aws:eks:eu-west-1:xxx:cluster/cluster_name in /Users/theofpa/.kube/config
You can retrieve your cluster name using:
% aws eks list-clusters
{
"clusters": [
"cluster_name"
]
}
I had the same error and solved it by upgrading my awscli to the latest version.
Removing and adding the ~/.aws/credentials file worked to resolve this issue for me.
rm ~/.aws/credentials
touch ~/.aws/credentials

Two clusters on EKS, how to switch between them

I am not exactly sure what's going on which is why I am asking this question. When I run this command:
kubectl config get-clusters
I get:
arn:aws:eks:us-west-2:91xxxxx371:cluster/eks-cluster-1
arn:aws:eks:us-west-2:91xxxxx371:cluster/eks1
then I run:
kubectl config current-context
and I get:
arn:aws:eks:us-west-2:91xxxxx371:cluster/eks-cluster-1
and if I run kubectl get pods, I get the expected output.
But how do I switch to the other cluster/context? what's the difference between the cluster and context? I can't figure out how these commands differ:
When I run them, I still get the pods from the wrong cluster:
root#4c2ab870baaf:/# kubectl config set-context arn:aws:eks:us-west-2:913617820371:cluster/eks1
Context "arn:aws:eks:us-west-2:913617820371:cluster/eks1" modified.
root#4c2ab870baaf:/#
root#4c2ab870baaf:/# kubectl get pods
NAME READY STATUS RESTARTS AGE
apache-spike-579598949b-5bjjs 1/1 Running 0 14d
apache-spike-579598949b-957gv 1/1 Running 0 14d
apache-spike-579598949b-k49hf 1/1 Running 0 14d
root#4c2ab870baaf:/# kubectl config set-cluster arn:aws:eks:us-west-2:91xxxxxx371:cluster/eks1
Cluster "arn:aws:eks:us-west-2:91xxxxx371:cluster/eks1" set.
root#4c2ab870baaf:/# kubectl get pods
NAME READY STATUS RESTARTS AGE
apache-spike-579598949b-5bjjs 1/1 Running 0 14d
apache-spike-579598949b-957gv 1/1 Running 0 14d
apache-spike-579598949b-k49hf 1/1 Running 0 14d
so I really don't know how to properly switch between clusters or contexts and also switch the auth routine when doing so.
For example:
contexts:
- context:
cluster: arn:aws:eks:us-west-2:91xxxxx371:cluster/ignitecluster
user: arn:aws:eks:us-west-2:91xxxx371:cluster/ignitecluster
name: arn:aws:eks:us-west-2:91xxxxx371:cluster/ignitecluster
- context:
cluster: arn:aws:eks:us-west-2:91xxxx371:cluster/teros-eks-cluster
user: arn:aws:eks:us-west-2:91xxxxx371:cluster/teros-eks-cluster
name: arn:aws:eks:us-west-2:91xxxxx371:cluster/teros-eks-cluster
To clarify on the difference between set-context and use-context
A context is a group of access parameters. Each context contains a Kubernetes cluster, a user, and a namespace. So when you do set-context, you just adding context details to your configuration file ~/.kube/config, but it doesn't switch you to that context, while use-context actually does.
Thus, as Vasily mentioned, in order to switch between clusters run
kubectl config use-context <CONTEXT-NAME>
Also, if you run kubectl config get-contexts you will see list of contexts with indication of the current one.
Use
kubectl config use-context arn:aws:eks:us-west-2:91xxxxx371:cluster/eks-cluster-1
and
kubectl config use-context arn:aws:eks:us-west-2:91xxxxx371:cluster/eks
Consider using kubectx for managing your contexts.
Usage
View all contexts (the current context is bolded):
$kubectx
arn:aws:eks:us-east-1:12234567:cluster/eks_app
->gke_my_second_cluster
my-rnd
my-prod
Switch to other context:
$ kubectx my-rnd
Switched to context "my-rnd".
Bonus:
In the same link - check also the kubens tool.
This is the best command to switch between different EKS clusters.
I use it every day.
aws eks update-kubeconfig --name example
Documentation:
https://docs.aws.amazon.com/cli/latest/reference/eks/update-kubeconfig.html