I am trying to deploy a Kubernetes cluster on AWS EKS using Terraform, run from a Gitlab CI pipeline. My code currently gets a full cluster up and running, except there is a step in which it tries to add the nodes (which are created separately) into the cluster.
When it tries to do this, this is the error I receive:
│ Error: configmaps is forbidden: User "system:serviceaccount:gitlab-managed-apps:default" cannot create resource "configmaps" in API group "" in the namespace "kube-system"
│
│ with module.mastercluster.kubernetes_config_map.aws_auth[0],
│ on .terraform/modules/mastercluster/aws_auth.tf line 63, in resource "kubernetes_config_map" "aws_auth":
│ 63: resource "kubernetes_config_map" "aws_auth" {
│
Terraform I believe is trying to edit the configmap aws_auth in the kube-system namespace, but for whatever reason, it doesn't have permission to do so?
I have found a different answer from years ago on Stackoverflow, that currently matches with what the documentation has to say about adding a aws_eks_cluster_auth data source and adding this to the kubernetes provider.
My configuration of this currently looks like this:
data "aws_eks_cluster" "mastercluster" {
name = module.mastercluster.cluster_id
}
data "aws_eks_cluster_auth" "mastercluster" {
name = module.mastercluster.cluster_id
}
provider "kubernetes" {
alias = "mastercluster"
host = data.aws_eks_cluster.mastercluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.mastercluster.certificate_authority[0].data)
token = data.aws_eks_cluster_auth.mastercluster.token
load_config_file = false
}
The weird thing is, this has worked for me before. I have successfully deployed multiple clusters using this method. This configuration is an almost identical copy to another one I had before, only the names of the clusters are different. I am totally lost as to why this can possibly go wrong.
Use semver to lock hashicorp provider versions
That's why is so important to use semver in terraform manifests.
As per Terraform documentation:
Terraform providers manage resources by communicating between Terraform and target APIs. Whenever the target APIs change or add functionality, provider maintainers may update and version the provider.
When multiple users or automation tools run the same Terraform configuration, they should all use the same versions of their required providers.
Use RBAC rules for Kubernetes
There is a Github issue filed about this: v2.0.1: Resources cannot be created. Does kubectl reference to kube config properly? · Issue #1127 · hashicorp/terraform-provider-kubernetes with the same error message as in yours case.
And one of the comments answers:
Offhand, this looks related to RBAC rules in the cluster (which may have been installed by the helm chart). This command might help diagnose the permissions issues relating to the service account in the error message.
$ kubectl auth can-i create namespace --as=system:serviceaccount:gitlab-prod:default
$ kubectl auth can-i --list --as=system:serviceaccount:gitlab-prod:default
You might be able to compare that list with other users on the cluster:
kubectl auth can-i --list --namespace=default --as=system:serviceaccount:default:default
$ kubectl auth can-i create configmaps
yes
$ kubectl auth can-i create configmaps --namespace=nginx-ingress --as=system:serviceaccount:gitlab-prod:default
no
And investigate related clusterroles:
$ kube describe clusterrolebinding system:basic-user
Name: system:basic-user
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
Role:
Kind: ClusterRole
Name: system:basic-user
Subjects:
Kind Name Namespace
---- ---- ---------
Group system:authenticated
$ kubectl describe clusterrole system:basic-user
Name: system:basic-user
Labels: kubernetes.io/bootstrapping=rbac-defaults
Annotations: rbac.authorization.kubernetes.io/autoupdate: true
PolicyRule:
Resources Non-Resource URLs Resource Names Verbs
--------- ----------------- -------------- -----
selfsubjectaccessreviews.authorization.k8s.io [] [] [create]
selfsubjectrulesreviews.authorization.k8s.io [] [] [create]
My guess is that the chart or Terraform config in question is responsible for creating the service account, and the [cluster] roles and rolebindings, but it might be doing so in the wrong order, or not idempotently (so you get different results on re-install vs the initial install). But we would need to see a configuration that reproduces this error. In my testing of version 2 of the providers on AKS, EKS, GKE, and minikube, I haven't seen this issue come up.
Feel free to browse these working examples of building specific clusters and using them with Kubernetes and Helm providers. Giving the config a skim might give you some ideas for troubleshooting further.
Howto solve RBAC issues
As for the error
Error: configmaps is forbidden: User "system:serviceaccount:kube-system:default" cannot list
There is great explanation by #m-abramovich:
First, some information for newbies.
In Kubernetes there are:
Account - something like your ID. Example: john
Role - some group in the project permitted to do something. Examples: cluster-admin, it-support, ...
Binding - joining Account to Role. "John in it-support" - is a binding.
Thus, in our message above, we see that our Tiller acts as account "default" registered at namespace "kube-system". Most likely you didn't bind him to a sufficient role.
Now back to the problem.
How do we track it:
check if you have specific account for tiller. Usually it has same name - "tiller":
kubectl [--namespace kube-system] get serviceaccount
create if not:
kubectl [--namespace kube-system] create serviceaccount tiller
check if you have role or clusterrole (cluster role is "better" for newbies - it is cluster-wide unlike namespace-wide role). If this is not a production, you can use highly privileged role "cluster-admin":
kubectl [--namespace kube-system] get clusterrole
you can check role content via:
kubectl [--namespace kube-system] get clusterrole cluster-admin -o yaml
check if account "tiller" in first clause has a binding to clusterrole "cluster-admin" that you deem sufficient:
kubectl [--namespace kube-system] get clusterrolebinding
if it is hard to figure out based on names, you can simply create new:
kubectl [--namespace kube-system] create clusterrolebinding tiller-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
finally, when you have the account, the role and the binding between them, you can check if you really act as this account:
kubectl [--namespace kube-system] get deploy tiller-deploy -o yaml
I suspect that your output will not have settings "serviceAccount" and "serviceAccountName":
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
if yes, than add an account you want tiller to use:
kubectl [--namespace kube-system] patch deploy tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'
(if you use PowerShell, then check below for post from #snpdev)
Now you repeat previous check command and see the difference:
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: tiller <-- new line
serviceAccountName: tiller <-- new line
terminationGracePeriodSeconds: 30
Resources:
Using RBAC Authorization | Kubernetes
Demystifying RBAC in Kubernetes | Cloud Native Computing Foundation
Helm | Role-based Access Control
Lock and Upgrade Provider Versions | Terraform - HashiCorp Learn
Related
ON GCP,I need to use 2 GCP project; One is for web-application, the other is for storing secrets for web-application ( which structure comes from google's repository
As written in README, I'll store secrets using GCP Secret Manager
This project is allocated for GCP Secret Manager for secrets shared by the organization.
procedure I'm planning
prj-secret : create secrets in secrets-manager
prj-application : read secret using kubernetes-external-secrets
in prj-application I want to use workload identity , because I don't want to use as serviceaccountkey doc saying
What I did
create cluser with -workload-pool=project-id.svc.id.goog option
helm install kubernetes-external-secrets
[skip] kubectl create namespace k8s-namespace ( because I install kubernetes-external-secrets on default name space)
[skip] kubectl create serviceaccount --namespace k8s-namespace ksa-name ( because I use default serviceaccount with exist by default when creating GKE)
create google-service-account with module "workload-identity
module "workload-identity" {
source = "github.com/terraform-google-modules/terraform-google-kubernetes-engine//modules/workload-identity"
use_existing_k8s_sa = true
cluster_name = var.cluster_name
location = var.cluter_locaton
k8s_sa_name = "external-secrets-kubernetes-external-secrets"
name = "external-secrets-kubernetes"
roles = ["roles/secretmanager.admin","roles/secretmanager.secretAccessor"]
project_id = var.project_id #it is prj-aplication's project_id
}
kubernetes_serviceaccount called external-secrets-kubernetes-external-secrets was already created when installing kubernetes-external-secrets with helm. and it bind k8s_sa_name &' external-secrets-kubernetes#my-project-id.iam.gserviceaccount.com, which has ["roles/secretmanager.admin","roles/secretmanager.secretAccessor"].
create externalsecret and apply
apiVersion: kubernetes-client.io/v1
kind: ExternalSecret
metadata:
name: external-key-test
spec:
backendType: gcpSecretsManager
projectId: my-domain
data:
- key: key-test
name: password
result
I got permission problem
ERROR, 7 PERMISSION_DENIED: Permission 'secretmanager.versions.access' denied for resource 'projects/project-id/secrets/external-key-test/versions/latest' (or it may not exist).
I already checked that, if I prj-secret and prj-application is same project, it worked.
So what I thought is,
kubernetes serviceaccount (in prj-secret) & google serviceaccount (in prj-application) cannot bind correctly.
I wonder if someone know
workload-identity works only in same project or not
if it is, how can I get secret data from different project
Thank you.
You have an issue in your role binding I think. When you say this:
kubernetes_serviceaccount called external-secrets-kubernetes-external-secrets was already created when installing kubernetes-external-secrets with helm. and it bind k8s_sa_name &' external-secrets-kubernetes#my-project-id.iam.gserviceaccount.com, which has ["roles/secretmanager.admin","roles/secretmanager.secretAccessor"].
It's unclear.
external-secrets-kubernetes#my-project-id.iam.gserviceaccount.com, is created on which project? I guess in prj-application, but not clear.
I take the assumption (with the name and the link with the cluster) that the service account is created in the prj-application. you grant the role "roles/secretmanager.admin","roles/secretmanager.secretAccessor" on which resource?
On the IAM page of the prj-application?
On the IAM page of the prj-secret?
On the secretId of the secret in the prj-secret?
If you did the 1st one, it's the wrong binding, the service account can only access to the secret of the prj-application, and not these of prj-secret.
Note, if you only need to access the secret, don't grand the admin role, only the accessor is required.
I'm following the official AWS EKS tutorial on setting up a distributed GPU cluster for Tensorflow model training and am hitting a bit of a snag.
After creating a new cluster using eksctl and verifying that the corresponding ~/.kube/config file exists on my gateway node, the tutorial instructs that I download ksonnet on the gateway node and use it to initialize a new application:
$ ks init <app-name>
When I try running this, however, I receive the following error:
INFO Using context "arn:aws:eks:us-west-2:131397771409:cluster/<cluster name>" from kubeconfig file "/home/ubuntu/.kube/config"
INFO Creating environment "default" with namespace "default", pointing to "version:v1.18.9" cluster at address <cluster address>
ERROR No Major.Minor.Patch elements found
I've done some searching around on Github/SO, but have not been able to find a resolution to this issue. I suspect the true answer is to move away from using ksonnet, as it is no longer being maintained (and hasn't been for the last 2 years it appears), but for the time being I'd just like to be able to complete the tutorial :)
Any insight is appreciated!
Contents of my ~/.kube/config:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: <certificate>
server: <server>
name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
contexts:
- context:
cluster: arn:aws:eks:us-west-2:131397771409:cluster/<name>
user: arn:aws:eks:us-west-2:131397771409:cluster/<name>
name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
current-context: arn:aws:eks:us-west-2:131397771409:cluster/<name>
kind: Config
preferences: {}
users:
- name: arn:aws:eks:us-west-2:131397771409:cluster/<name>
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- --region
- us-west-2
- eks
- get-token
- --cluster-name
- <name>
command: aws
On the init, you can override the api spec version (that worked for me on that particular step although I got into other issues later on):
ks init ${APP_NAME} --api-spec=version:v1.7.0
Reference
In the end, I made it work with ks init ${APP_NAME} (without --api-spec) in GCP using ksonnet v0.13.1 on old kubeflow (v0.2.0-rc.1) and GKE cluster (1.14.10) versions.
BTW, I was in "Kubeflow: End to End" qwiklab from this course.
Trying to setup a EKS cluster.
An error occurred (AccessDeniedException) when calling the DescribeCluster operation: Account xxx is not authorized to use this service. This error came form the CLI, on the console I was able to crate the cluster and everything successfully.
I am logged in as the root user (its just my personal account).
It says Account so sounds like its not a user/permissions issue?
Do I have to enable my account for this service? I don't see any such option.
Also if login as a user (rather than root) - will I be able to see everything that was earlier created as root. I have now created a user and assigned admin and eks* permissions. I checked this when I sign in as the user - I can see everything.
The aws cli was setup with root credentials (I think) - so do I have to go back and undo fix all this and just use this user.
Update 1
I redid/restarted everything including user and awscli configure - just to make sure. But still the issue did not get resolved.
There is an option to create the file manually - that finally worked.
And I was able to : kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.100.0.1 443/TCP 48m
KUBECONFIG: I had setup the env:KUBECONFIG
$env:KUBECONFIG="C:\Users\sbaha\.kube\config-EKS-nginixClstr"
$Env:KUBECONFIG
C:\Users\sbaha\.kube\config-EKS-nginixClstr
kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* aws kubernetes aws
kubectl config current-context
aws
My understanding is is I should see both the aws and my EKS-nginixClstr contexts but I only see aws - is this (also) a issue?
Next Step is to create and add worker nodes. I updated the node arn correctly in the .yaml file: kubectl apply -f ~\.kube\aws-auth-cm.yaml
configmap/aws-auth configured So this perhaps worked.
But next it fails:
kubectl get nodes No resources found in default namespace.
On AWS Console NodeGrp shows- Create Completed. Also on CLI kubectl get nodes --watch - it does not even return.
So this this has to be debugged next- (it never ends)
aws-auth-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: arn:aws:iam::arn:aws:iam::xxxxx:role/Nginix-NodeGrpClstr-NodeInstanceRole-1SK61JHT0JE4
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
This problem was related to not having the correct version of eksctl - it must be at least 0.7.0. The documentation states this and I knew this, but initially whatever I did could not get beyond 0.6.0. The way you get it is to configure your AWS CLI to a region that supports EKS. Once you get 0.7.0 this issue gets resolved.
Overall to make EKS work - you must have the same user both on console and CLI, and you must work on a region that supports EKS, and have correct eksctl version 0.7.0.
I'm setting up AWS EKS cluster using terraform from an EC2 instance. Basically the setup includes EC2 launch configuration and autoscaling for worker nodes. After creating the cluster, I am able to configure kubectl with aws-iam-authenticator. When I did
kubectl get nodes
It returned
No resources found
as the worker nodes were not joined. So I tried updating aws-auth-cm.yaml file
apiVersion: v1
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
data:
mapRoles: |
- rolearn: <ARN of instance role (not instance profile)>
username: system:node:{{EC2PrivateDNSName}}
groups:
- system:bootstrappers
- system:nodes
with IAM role ARN of the worker node. And did
kubectl apply -f aws-auth-cm.yaml
It returned
ConfigMap/aws-auth created
Then I understood that role ARN configured in aws-auth-cm.yaml is the wrong one. So I updated the same file with the exact worker node role ARN.
But this time I got 403 when I did kubectl apply -f aws-auth-cm.yaml again.
It returned
Error from server (Forbidden): error when retrieving current
configuration of: Resource: "/v1, Resource=configmaps",
GroupVersionKind: "/v1, Kind=ConfigMap" Name: "aws-auth", Namespace:
"kube-system" Object: &{map["apiVersion":"v1" "data":map["mapRoles":"-
rolearn: arn:aws:iam::XXXXXXXXX:role/worker-node-role\n username:
system:node:{{EC2PrivateDNSName}}\n groups:\n -
system:bootstrappers\n - system:nodes\n"] "kind":"ConfigMap"
"metadata":map["name":"aws-auth" "namespace":"kube-system"
"annotations":map["kubectl.kubernetes.io/last-applied-configuration":""]]]}
from server for: "/home/username/aws-auth-cm.yaml": configmaps
"aws-auth" is forbidden: User
"system:node:ip-XXX-XX-XX-XX.ec2.internal" cannot get resource
"configmaps" in API group "" in the namespace "kube-system"
I'm not able to reconfigure the ConfigMap after this step.
I'm getting 403 for commands like
kubectl apply
kubectl delete
kubectl edit
for configmaps. Any help?
I found the reason why kubectl returned 403 for this scenario.
As per this doc, the user/role who created the cluster will be given system:masters permissions in the cluster's RBAC configuration
When I tried to create a ConfigMap for aws-auth to join worker nodes, I gave the ARN of role/user who created the cluster instead of ARN of worker nodes.
And it updated the group(system:masters) of admin with groups system:bootstrappers and system:nodes in RBAC which basically locked the admin himself. And it is not recoverable since admin has lost the privileges from group system:masters.
I was testing some commands and I ran
$ kubectl delete nodes --all
and it deletes de-registers all the nodes including the masters. Now I can't connect to the cluster (Well, Obviously as the master is deleted).
Is there a way to prevent this as anyone could accidentally do this?
Extra Info: I am using KOps for deployment.
P.S. It does not delete the EC2 instances and the nodes come up on doing a EC2 instance reboot on all the instances.
By default, you using something like a superuser who can do anything he want with a cluster.
For limit access to a cluster for other users you can use RBAC authorization for. By RBAC rules you can manage access and limits per resource and action.
In few words, for do that you need to:
Create new cluster by Kops with --authorization RBAC or modify existing one by adding 'rbac' option to cluster's configuration to 'authorization' section:
authorization:
rbac: {}
Now, we can follow that instruction from Bitnami for create a user. For example, let's creating a user which has access only to office namespace and only for a few actions. So, we need to create a namespace firs:
kubectl create namespace office
Create a key and certificates for new user:
openssl genrsa -out employee.key 2048
openssl req -new -key employee.key -out employee.csr -subj "/CN=employee/O=bitnami"
Now, using your CA authority key (It available in the S3 bucket under PKI) we need to approve new certificate:
openssl x509 -req -in employee.csr -CA CA_LOCATION/ca.crt -CAkey CA_LOCATION/ca.key -CAcreateserial -out employee.crt -days 500
Creating credentials:
kubectl config set-credentials employee --client-certificate=/home/employee/.certs/employee.crt --client-key=/home/employee/.certs/employee.key
Setting a right context:
kubectl config set-context employee-context --cluster=YOUR_CLUSTER_NAME --namespace=office --user=employee
New we have a user without access to anything. Let's create a new role with limited access, here is example of Role which will have access only to deployments, replicasets and pods for create, delete and modify them and nothing more. Create file role-deployment-manager.yaml with Role configuration:
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
namespace: office
name: deployment-manager
rules:
- apiGroups: ["", "extensions", "apps"]
resources: ["deployments", "replicasets", "pods"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
Create a new file rolebinding-deployment-manager.yaml with Rolebinding, which will attach your Role to user:
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: deployment-manager-binding
namespace: office
subjects:
- kind: User
name: employee
apiGroup: ""
roleRef:
kind: Role
name: deployment-manager
apiGroup: ""
Now apply that configurations:
kubectl create -f role-deployment-manager.yaml
kubectl create -f rolebinding-deployment-manager.yaml
So, now you have a user with limited access and he cannot destroy your cluster.
Anton Kostenko describes a good way of preventing what you've described. Below I give details of how you can ensure the apiserver remains accessible even if someone does accidentally delete all the node objects:
Losing connectivity to the apiserver by deleting node objects will only happen if the components necessary for connecting to the apiserver (e.g. the apisever itself and etcd) are managed by a component (i.e. the kubelet) that depends on the apiserver being up (GKE for example can scale down to 0 worker nodes, leaving no node objects, but the apiserver will still be accessible).
As a specific example, my personal cluster has a single master node with all the control plane components described as static Pod manifests and placed in the directory referred to by the --pod-manifest-path flag on the kubelet on that master node. Deleting all the node objects as you did in the question caused all my workloads to go into a pending state but the apiserver was still accessible in this case because the control plane components are run regardless of whether the kubelet can access the apiserver.
Common ways to prevent what you've just described is to run the apiserver and etcd as static manifests managed by the kubelet as I just described or to run them independently of any kubelet, perhaps as systemd units.