Attempting to share cluster access results in creating new cluster - amazon-web-services

I have deleted the config file I used when experimenting with Kubernetes on my AWS (using this tutorial) and replaced it with another devs config file when they set up Kubernetes on a shared AWS (using this). When I run kubectl config view I see the following above the users section:
- cluster:
certificate-authority-data: REDACTED
server: <removed>
name: aws_kubernetes
contexts:
- context:
cluster: aws_kubernetes
user: aws_kubernetes
name: aws_kubernetes
current-context: aws_kubernetes
This leads me to believe that my config should be pointing to use our shared AWS but whenever I run cluster/kube-up.sh it creates a new GCE cluster so I'm thinking I'm using the wrong command to spin up the cluster on AWS.
Am I using the wrong command/missing a flag/etc? Additionally I'm thinking kube-up creates a new cluster instead of recreating a previously instantiated one.

If you are sharing the cluster, you shouldn't need to run kube-up.sh (that script only needs to be run once to initially create a cluster). Once a cluster exists, you can use the standard kubectl commands to interact with it. Try starting with kubectl get nodes to verify that your configuration file has valid credentials and you see the expected AWS nodes printed in the output.

Related

How do I create an EKS cluster with nodes via CDK?

I'm able to deploy a Kubernetes Fargate cluster via CDK on my desired VPC:
const vpc = ec2.Vpc.fromLookup(this, 'vpc', {
vpcId: 'vpc-abcdefg'
})
const cluster = new eks.FargateCluster(this, 'sample-eks', {
version: eks.KubernetesVersion.V1_21,
vpc,
})
cluster.addNodegroupCapacity('node-group-capacity', {
minSize: 2,
maxSize: 2,
})
However, there are no nodes within this cluster:
$ kubectl config get-clusters
NAME
minikube
arn:aws:eks:us-east-1:<account_number>:cluster/<cluster_name>
$ kubectl get nodes
No resources found
Very confused as to why this is happening, as I thought the addNodegroupCapacity method is supposed to add nodes to the cluster. I think I can add nodes post-hoc via eksctl, but I was wondering if it'd be possible to deploy with nodes via CDK.
My mistake was not adding a role/user with sufficient permissions to the aws-auth ConfigMap. This meant that the cluster did not have proper permissions to create nodes. The following fixed my issue:
const role = iam.Role.fromRoleName(this, 'admin-role', '<my-admin-role>');
cluster.awsAuth.addRoleMapping(role, { groups: [ 'system:masters' ]});
The <my-admin-role> argument is the name of the role that I assume when I log in to AWS. I found it by running aws sts get-caller-identity, which returns a JSON doc that provides your assumed role's ARN. For me it was arn:aws:sts::<account-number>:assumed-role/<my-admin-role>/<my-username>.
This also resolved another issue, as I was not able to interact with the cluster via kubectl. I would get the following error message: error: You must be logged in to the server (Unauthorized). Adding my assumed role to the aws-auth ConfigMap gave me permission to access the cluster via my terminal.
Not sure why I haven't seen this bit of configuration in the tutorials I've used, would appreciate any comments that could help explain this to me.

AWS IAM Role - AccessDenied error in one pod

I have a service account which I am trying to use across multiple pods installed in the same namespace.
One of the pods is created by Airflow KubernetesPodOperator.
The other is created via Helm through Kubernetes deployment.
In the Airflow deployment, I see the IAM role being assigned and DynamoDB tables are created, listed etc however in the second helm chart deployment (or) in a test pod (created as shown here), I keep getting AccessDenied error for CreateTable in DynamoDB.
I can see the AWS Role ARN being assigned to the service account and the service account being applied to the pod and the corresponding token file also being created, but I see AccessDenied exception.
arn:aws:sts::1234567890:assumed-role/MyCustomRole/aws-sdk-java-1636152310195 is not authorized to perform: dynamodb:CreateTable on resource
ServiceAccount
Name: mypipeline-service-account
Namespace: abc-qa-daemons
Labels: app.kubernetes.io/managed-by=Helm
chart=abc-pipeline-main.651
heritage=Helm
release=ab-qa-pipeline
tier=mypipeline
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::1234567890:role/MyCustomRole
meta.helm.sh/release-name: ab-qa-pipeline
meta.helm.sh/release-namespace: abc-qa-daemons
Image pull secrets: <none>
Mountable secrets: mypipeline-service-account-token-6gm5b
Tokens: mypipeline-service-account-token-6gm5b
P.S: Both the client code created using KubernetesPodOperator and through Helm chart deployment is same i.e. same docker image. Other attributes like nodeSelector, tolerations etc, volume mounts are also same.
The describe pod output for both of them is similar with just some name and label changes.
The KubernetesPodOperator pod has QoS class as Burstable while the Helm chart ones is BestEffort.
Why do I get AccessDenied in Helm deployment but not in KubernetesPodOperator? How to debug this issue?
Whenever we get an AccessDenied exception, there can be two possible reasons:
You have assigned the wrong role
The assigned role doesn't have necessary permissions
In my case, latter is the issue. The permissions assigned to particular role can be sophisticated i.e. they can be more granular.
For example, in my case, the DynamoDB tables which the role can create/describe is limited to only those that are starting with a specific prefix but not all the DynamoDB tables.
So, it is always advisable to check the IAM role permissions whenever
you get this error.
As stated in the question, be sure to check the service account using the awscli image.
Keep in mind that, there is a credential provider chain used in AWS SDKs which determines the credentials to be used by the application. In most cases, the DefaultAWSCredentialsProviderChain is used and its order is given below. Ensure that the SDK is picking up the intended provider (in our case it is WebIdentityTokenCredentialsProvider)
super(new EnvironmentVariableCredentialsProvider(),
new SystemPropertiesCredentialsProvider(),
new ProfileCredentialsProvider(),
WebIdentityTokenCredentialsProvider.create(),
new EC2ContainerCredentialsProviderWrapper());
Additionally, you might also want to set the AWS SDK classes to DEBUG mode in your logger to see which credentials provider is being picked up and why.
To check if the service account is applied to a pod, describe it and check if the AWS environment variables are set to it like AWS_REGION, AWS_DEFAULT_REGION, AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE.
If not, then check your service account if it has the AWS annotation eks.amazonaws.com/role-arn by describing that service account.

Terraform EKS configmaps is forbidden

I am trying to deploy a Kubernetes cluster on AWS EKS using Terraform, run from a Gitlab CI pipeline. My code currently gets a full cluster up and running, except there is a step in which it tries to add the nodes (which are created separately) into the cluster.
When it tries to do this, this is the error I receive:
│ Error: configmaps is forbidden: User "system:serviceaccount:gitlab-managed-apps:default" cannot create resource "configmaps" in API group "" in the namespace "kube-system"
│
│ with module.mastercluster.kubernetes_config_map.aws_auth[0],
│ on .terraform/modules/mastercluster/aws_auth.tf line 63, in resource "kubernetes_config_map" "aws_auth":
│ 63: resource "kubernetes_config_map" "aws_auth" {
│
Terraform I believe is trying to edit the configmap aws_auth in the kube-system namespace, but for whatever reason, it doesn't have permission to do so?
I have found a different answer from years ago on Stackoverflow, that currently matches with what the documentation has to say about adding a aws_eks_cluster_auth data source and adding this to the kubernetes provider.
My configuration of this currently looks like this:
data "aws_eks_cluster" "mastercluster" {
name = module.mastercluster.cluster_id
}
data "aws_eks_cluster_auth" "mastercluster" {
name = module.mastercluster.cluster_id
}
provider "kubernetes" {
alias = "mastercluster"
host = data.aws_eks_cluster.mastercluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.mastercluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.mastercluster.token
load_config_file = false
}
The weird thing is, this has worked for me before. I have successfully deployed multiple clusters using this method. This configuration is an almost identical copy to another one I had before, only the names of the clusters are different. I am totally lost as to why this can possibly go wrong.
Use semver to lock hashicorp provider versions
That's why is so important to use semver in terraform manifests.
As per Terraform documentation:
Terraform providers manage resources by communicating between Terraform and target APIs. Whenever the target APIs change or add functionality, provider maintainers may update and version the provider.
When multiple users or automation tools run the same Terraform configuration, they should all use the same versions of their required providers.
Use RBAC rules for Kubernetes
There is a Github issue filed about this: v2.0.1: Resources cannot be created. Does kubectl reference to kube config properly? · Issue #1127 · hashicorp/terraform-provider-kubernetes with the same error message as in yours case.
And one of the comments answers:
Offhand, this looks related to RBAC rules in the cluster (which may have been installed by the helm chart). This command might help diagnose the permissions issues relating to the service account in the error message.
$ kubectl auth can-i create namespace --as=system:serviceaccount:gitlab-prod:default
$ kubectl auth can-i --list --as=system:serviceaccount:gitlab-prod:default
You might be able to compare that list with other users on the cluster:
kubectl auth can-i --list --namespace=default --as=system:serviceaccount:default:default
$ kubectl auth can-i create configmaps
yes
$ kubectl auth can-i create configmaps --namespace=nginx-ingress --as=system:serviceaccount:gitlab-prod:default
no
And investigate related clusterroles:
$ kube describe clusterrolebinding system:basic-user
Name: system:basic-user
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
Role:
Kind: ClusterRole
Name: system:basic-user
Subjects:
Kind Name Namespace
---- ---- ---------
Group system:authenticated
$ kubectl describe clusterrole system:basic-user
Name: system:basic-user
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
selfsubjectaccessreviews.authorization.k8s.io [] [] [create]
selfsubjectrulesreviews.authorization.k8s.io [] [] [create]
My guess is that the chart or Terraform config in question is responsible for creating the service account, and the [cluster] roles and rolebindings, but it might be doing so in the wrong order, or not idempotently (so you get different results on re-install vs the initial install). But we would need to see a configuration that reproduces this error. In my testing of version 2 of the providers on AKS, EKS, GKE, and minikube, I haven't seen this issue come up.
Feel free to browse these working examples of building specific clusters and using them with Kubernetes and Helm providers. Giving the config a skim might give you some ideas for troubleshooting further.
Howto solve RBAC issues
As for the error
Error: configmaps is forbidden: User "system:serviceaccount:kube-system:default" cannot list
There is great explanation by #m-abramovich:
First, some information for newbies.
In Kubernetes there are:
Account - something like your ID. Example: john
Role - some group in the project permitted to do something. Examples: cluster-admin, it-support, ...
Binding - joining Account to Role. "John in it-support" - is a binding.
Thus, in our message above, we see that our Tiller acts as account "default" registered at namespace "kube-system". Most likely you didn't bind him to a sufficient role.
Now back to the problem.
How do we track it:
check if you have specific account for tiller. Usually it has same name - "tiller":
kubectl [--namespace kube-system] get serviceaccount
create if not:
kubectl [--namespace kube-system] create serviceaccount tiller
check if you have role or clusterrole (cluster role is "better" for newbies - it is cluster-wide unlike namespace-wide role). If this is not a production, you can use highly privileged role "cluster-admin":
kubectl [--namespace kube-system] get clusterrole
you can check role content via:
kubectl [--namespace kube-system] get clusterrole cluster-admin -o yaml
check if account "tiller" in first clause has a binding to clusterrole "cluster-admin" that you deem sufficient:
kubectl [--namespace kube-system] get clusterrolebinding
if it is hard to figure out based on names, you can simply create new:
kubectl [--namespace kube-system] create clusterrolebinding tiller-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
finally, when you have the account, the role and the binding between them, you can check if you really act as this account:
kubectl [--namespace kube-system] get deploy tiller-deploy -o yaml
I suspect that your output will not have settings "serviceAccount" and "serviceAccountName":
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
if yes, than add an account you want tiller to use:
kubectl [--namespace kube-system] patch deploy tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
(if you use PowerShell, then check below for post from #snpdev)
Now you repeat previous check command and see the difference:
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: tiller <-- new line
serviceAccountName: tiller <-- new line
terminationGracePeriodSeconds: 30
Resources:
Using RBAC Authorization | Kubernetes
Demystifying RBAC in Kubernetes | Cloud Native Computing Foundation
Helm | Role-based Access Control
Lock and Upgrade Provider Versions | Terraform - HashiCorp Learn

EksCtl : Update node-definitions via cluster config file not working

I am using eksctl to create our EKS cluster.
For the first run, it works out good, but if I want to upgrade the cluster-config later in the future, it's not working.
I have a cluster-config file with me, but any changes made to it are not reflect with update/upgrade command.
What am I missing?
Cluster.yaml :
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: supplier-service
region: eu-central-1
vpc:
subnets:
public:
eu-central-1a: {id: subnet-1}
eu-central-1b: {id: subnet-2}
eu-central-1c: {id: subnet-2}
nodeGroups:
- name: ng-1
instanceType: t2.medium
desiredCapacity: 3
ssh:
allow: true
securityGroups:
withShared: true
withLocal: true
attachIDs: ['sg-1', 'sg-2']
iam:
withAddonPolicies:
autoScaler: true
Now, if in the future, I would like to make change to instance.type or replicas, I have to destroy entire cluster and recreate...which becomes quite cumbersome.
How can I do in-place upgrades with clusters created by EksCtl? Thank you.
I'm looking into the exact same issue as yours.
After a bunch of searches against the Internet, I found that it is not possible yet to in-place upgrade your existing node group in EKS.
First, eksctl update has become deprecated. When I executed eksctl upgrade --help, it gave a warning like this:
DEPRECATED: use 'upgrade cluster' instead. Upgrade control plane to the next version.
Second, as mentioned in this GitHub issue and eksctl document, up to now the eksctl upgrade nodegroup is used only for upgrading the version of managed node group.
So unfortunately, you'll have to create a new node group to apply your changes, migrate your workload/switch your traffic to new node group and decommission the old one. In your case, it's not necessary to nuke the entire cluster and recreate.
If you're seeking for seamless upgrade/migration with minimum/zero down time, I suggest you try managed node group, in which the graceful draining of workload seems promising:
Node updates and terminations gracefully drain nodes to ensure that your applications stay available.
Note: in your config file above, if you specify nodeGroups rather than managedNodeGroups, an unmanaged node group will be provisioned.
However, don't lose hope. An active issue in eksctl GitHub repository has been lodged to add eksctl apply option. At this stage it's not yet released. Would be really nice if this came true.
To upgrade the cluster using eksctl:
Upgrade the control plane version
Upgrade coredns, kube-proxy and aws-node
Upgrade the worker nodes
If you just want to update nodegroup and keep the same configuration, you can just change nodegroup names, e.g. append -v2 to the name. [0]
If you want to change the node group configuration 'instance type', you need to just create a new node group: eksctl create nodegroup --config-file=dev-cluster.yaml [1]
[0] https://eksctl.io/usage/cluster-upgrade/#updating-multiple-nodegroups-with-config-file
[1] https://eksctl.io/usage/managing-nodegroups/#creating-a-nodegroup-from-a-config-file

AWS EKS kubectl - No resources found in default namespace

Trying to setup a EKS cluster.
An error occurred (AccessDeniedException) when calling the DescribeCluster operation: Account xxx is not authorized to use this service. This error came form the CLI, on the console I was able to crate the cluster and everything successfully.
I am logged in as the root user (its just my personal account).
It says Account so sounds like its not a user/permissions issue?
Do I have to enable my account for this service? I don't see any such option.
Also if login as a user (rather than root) - will I be able to see everything that was earlier created as root. I have now created a user and assigned admin and eks* permissions. I checked this when I sign in as the user - I can see everything.
The aws cli was setup with root credentials (I think) - so do I have to go back and undo fix all this and just use this user.
Update 1
I redid/restarted everything including user and awscli configure - just to make sure. But still the issue did not get resolved.
There is an option to create the file manually - that finally worked.
And I was able to : kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.100.0.1 443/TCP 48m
KUBECONFIG: I had setup the env:KUBECONFIG
$env:KUBECONFIG="C:\Users\sbaha\.kube\config-EKS-nginixClstr"
$Env:KUBECONFIG
C:\Users\sbaha\.kube\config-EKS-nginixClstr
kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* aws kubernetes aws
kubectl config current-context
aws
My understanding is is I should see both the aws and my EKS-nginixClstr contexts but I only see aws - is this (also) a issue?
Next Step is to create and add worker nodes. I updated the node arn correctly in the .yaml file: kubectl apply -f ~\.kube\aws-auth-cm.yaml
configmap/aws-auth configured So this perhaps worked.
But next it fails:
kubectl get nodes No resources found in default namespace.
On AWS Console NodeGrp shows- Create Completed. Also on CLI kubectl get nodes --watch - it does not even return.
So this this has to be debugged next- (it never ends)
aws-auth-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: arn:aws:iam::arn:aws:iam::xxxxx:role/Nginix-NodeGrpClstr-NodeInstanceRole-1SK61JHT0JE4
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
This problem was related to not having the correct version of eksctl - it must be at least 0.7.0. The documentation states this and I knew this, but initially whatever I did could not get beyond 0.6.0. The way you get it is to configure your AWS CLI to a region that supports EKS. Once you get 0.7.0 this issue gets resolved.
Overall to make EKS work - you must have the same user both on console and CLI, and you must work on a region that supports EKS, and have correct eksctl version 0.7.0.