Not authorized to perform sts:AssumeRoleWithWebIdentity- 403 - amazon-web-services

I have been trying to run an external-dns pod using the guide provided by k8s-sig group. I have followed every step of the guide, and getting the below error.
time="2021-02-27T13:27:20Z" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 87a3ca86-ceb0-47be-8f90-25d0c2de9f48"
I had created AWS IAM policy using Terraform, and it was successfully created. Except IAM Role for service account for which I had used eksctl, everything else has been spun via Terraform.
But then I got hold of this article which says creating AWS IAM policy using awscli would eliminate this error. So I deleted the policy created using Terraform, and recreated it with awscli. Yet, it is throwing the same error error.
Below is my external dns yaml file.
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-dns
# If you're using Amazon EKS with IAM Roles for Service Accounts, specify the following annotation.
# Otherwise, you may safely omit it.
annotations:
# Substitute your account ID and IAM service role name below.
eks.amazonaws.com/role-arn: arn:aws:iam::268xxxxxxx:role/eksctl-ats-Eks1-addon-iamserviceaccoun-Role1-WMLL93xxxx
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: external-dns
rules:
- apiGroups: [""]
resources: ["services","endpoints","pods"]
verbs: ["get","watch","list"]
- apiGroups: ["extensions","networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get","watch","list"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: external-dns-viewer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: external-dns
subjects:
- kind: ServiceAccount
name: external-dns
namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: external-dns
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: external-dns
template:
metadata:
labels:
app: external-dns
spec:
serviceAccountName: external-dns
containers:
- name: external-dns
image: k8s.gcr.io/external-dns/external-dns:v0.7.6
args:
- --source=service
- --source=ingress
- --domain-filter=xyz.com # will make ExternalDNS see only the hosted zones matching provided domain, omit to process all available hosted zones
- --provider=aws
- --policy=upsert-only # would prevent ExternalDNS from deleting any records, omit to enable full synchronization
- --aws-zone-type=public # only look at public hosted zones (valid values are public, private or no value for both)
- --registry=txt
- --txt-owner-id=Z0471542U7WSPZxxxx
securityContext:
fsGroup: 65534 # For ExternalDNS to be able to read Kubernetes and AWS token files
I am scratching my head as there is no proper solution to this error anywhere in the net. Hoping to find a solution to this issue in this forum.
End result must show something like below and fill up records in hosted zone.
time="2020-05-05T02:57:31Z" level=info msg="All records are already up to date"

I also struggled with this error.
The problem was in the definition of the trust relationship.
You can see in some offical aws tutorials (like this) the following setup:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:sub": "system:serviceaccount:<my-namespace>:<my-service-account>"
}
}
}
]
}
Option 1 for failure
My problem was that I passed the a wrong value for my-service-account at the end of ${OIDC_PROVIDER}:sub in the Condition part.
Option 2 for failure
After the previous fix - I still faced the same error - it was solved by following this aws tutorial which shows the output of using the eksctl with the command below:
eksctl create iamserviceaccount \
--name my-serviceaccount \
--namespace <your-ns> \
--cluster <your-cluster-name> \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
--approve
When you look at the output in the trust relationship tab in the AWS web console - you can see that an additional condition was added with the postfix of :aud and the value of sts.amazonaws.com:
So this need to be added after the "${OIDC_PROVIDER}:sub" condition.

I was able to get help from the Kubernetes Slack (shout out to #Rob Del) and this is what we came up with. There's nothing wrong with the k8s rbac from the article, the issue is the way the IAM role is written. I am using Terraform v0.12.24, but I believe something similar to the following .tf should work for Terraform v0.14:
data "aws_caller_identity" "current" {}
resource "aws_iam_role" "external_dns_role" {
name = "external-dns"
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": format(
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:%s",
replace(
"${aws_eks_cluster.<YOUR_CLUSTER_NAME>.identity[0].oidc[0].issuer}",
"https://",
"oidc-provider/"
)
)
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
format(
"%s:sub",
trimprefix(
"${aws_eks_cluster.<YOUR_CLUSTER_NAME>.identity[0].oidc[0].issuer}",
"https://"
)
) : "system:serviceaccount:default:external-dns"
}
}
}
]
})
}
The above .tf assume you created your eks cluster using terraform and that you use the rbac manifest from the external-dns tutorial.

I have a few possibilities here.
Before anything else, does your cluster have an OIDC provider associated with it? IRSA won't work without it.
You can check that in the AWS console, or via the CLI with:
aws eks describe-cluster --name {name} --query "cluster.identity.oidc.issuer"
First
Delete the iamserviceaccount, recreate it, remove the ServiceAccount definition from your ExternalDNS manfiest (the entire first section) and re-apply it.
eksctl delete iamserviceaccount --name {name} --namespace {namespace} --cluster {cluster}
eksctl create iamserviceaccount --name {name} --namespace {namespace} --cluster
{cluster} --attach-policy-arn {policy-arn} --approve --override-existing-serviceaccounts
kubectl apply -n {namespace} -f {your-externaldns-manifest.yaml}
It may be that there is some conflict going on as you have overwritten what you created with eksctl createiamserviceaccount by also specifying a ServiceAccount in your ExternalDNS manfiest.
Second
Upgrade your cluster to v1.19 (if it's not there already):
eksctl upgrade cluster --name {name} will show you what will be done;
eksctl upgrade cluster --name {name} --approve will do it
Third
Some documentation suggests that in addition to setting securityContext.fsGroup: 65534, you also need to set securityContext.runAsUser: 0.

I've been struggling with a similar issue after following the setup suggested here
I ended up with the exception below in the deploy logs.
time="2021-05-10T06:40:17Z" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 3fda6c69-2a0a-4bc9-b478-521b5131af9b"
time="2021-05-10T06:41:20Z" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 7d3e07a2-c514-44fa-8e79-d49314d9adb6"
In my case, it was an issue with wrong Service account name mapped to the new role created.
Here is a step by step approach to get this done without much hiccups.
Create the IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"route53:ChangeResourceRecordSets"
],
"Resource": [
"arn:aws:route53:::hostedzone/*"
]
},
{
"Effect": "Allow",
"Action": [
"route53:ListHostedZones",
"route53:ListResourceRecordSets"
],
"Resource": [
"*"
]
}
]
}
Create the IAM role and the service account for your EKS cluster.
eksctl create iamserviceaccount \
--name external-dns-sa-eks \
--namespace default \
--cluster aecops-grpc-test \
--attach-policy-arn arn:aws:iam::xxxxxxxx:policy/external-dns-policy-eks \
--approve
--override-existing-serviceaccounts
Created new hosted zone.
aws route53 create-hosted-zone --name "hosted.domain.com." --caller-reference "grpc-endpoint-external-dns-test-$(date +%s)"
Deploy ExternalDNS, after creating the Cluster role and Cluster role binding to the previously created service account.
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: external-dns
rules:
- apiGroups: [""]
resources: ["services","endpoints","pods"]
verbs: ["get","watch","list"]
- apiGroups: ["extensions","networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get","watch","list"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: external-dns-viewer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: external-dns
subjects:
- kind: ServiceAccount
name: external-dns-sa-eks
namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: external-dns
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: external-dns
template:
metadata:
labels:
app: external-dns
# If you're using kiam or kube2iam, specify the following annotation.
# Otherwise, you may safely omit it.
annotations:
iam.amazonaws.com/role: arn:aws:iam::***********:role/eksctl-eks-cluster-name-addon-iamserviceacco-Role1-156KP94SN7D7
spec:
serviceAccountName: external-dns-sa-eks
containers:
- name: external-dns
image: k8s.gcr.io/external-dns/external-dns:v0.7.6
args:
- --source=service
- --source=ingress
- --domain-filter=hosted.domain.com. # will make ExternalDNS see only the hosted zones matching provided domain, omit to process all available hosted zones
- --provider=aws
- --policy=upsert-only # would prevent ExternalDNS from deleting any records, omit to enable full synchronization
- --aws-zone-type=public # only look at public hosted zones (valid values are public, private or no value for both)
- --registry=txt
- --txt-owner-id=my-hostedzone-identifier
securityContext:
fsGroup: 65534 # For ExternalDNS to be able to read Kubernetes and AWS token files
Update Ingress resource with the domain name and reapply the manifest.
For ingress objects, ExternalDNS will create a DNS record based on the host specified for the ingress object.
- host: myapp.hosted.domain.com
Validate new records created.
BASH-3.2$ aws route53 list-resource-record-sets --output json
--hosted-zone-id "/hostedzone/Z065*********" --query "ResourceRecordSets[?Name == 'hosted.domain.com..']|[?Type == 'A']"
[
{
"Name": "myapp.hosted.domain.com..",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "ZCT6F*******",
"DNSName": "****************.elb.ap-southeast-2.amazonaws.com.",
"EvaluateTargetHealth": true
}
} ]

In our case this issue occurred when using the Terraform module to create the eks cluster, and eksctl to create the iamserviceaccount for the aws-load-balancer controller. It all works fine the first go-round. But if you do a terraform destroy, you need to do some cleanup, like delete the CloudFormation script created by eksctl. Somehow things got crossed, and the CloudTrail was passing along a resource role that was no longer valid. So check the annotation of the service account to ensure it's valid, and update it if necessary. Then in my case I deleted and redeployed the aws-load-balancer-controller
%> kubectl describe serviceaccount aws-load-balancer-controller -n kube-system
Name: aws-load-balancer-controller
Namespace: kube-system
Labels: app.kubernetes.io/managed-by=eksctl
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::212222224610:role/eksctl-ch-test-addon-iamserviceaccou-Role1-JQL4R3JM7I1A
Image pull secrets: <none>
Mountable secrets: aws-load-balancer-controller-token-b8hw7
Tokens: aws-load-balancer-controller-token-b8hw7
Events: <none>
%>
%> kubectl annotate --overwrite serviceaccount aws-load-balancer-controller eks.amazonaws.com/role-arn='arn:aws:iam::212222224610:role/eksctl-ch-test-addon-iamserviceaccou-Role1-17A92GGXZRY6O' -n kube-system

In my case, I was able to attach the oidc role with route53 permissions policy and that resolved the error.
https://medium.com/swlh/amazon-eks-setup-external-dns-with-oidc-provider-and-kube2iam-f2487c77b2a1
and then with the external-dns service account used that instead of the cluster role.
annotations:
# # Substitute your account ID and IAM service role name below.
eks.amazonaws.com/role-arn: arn:aws:iam::<account>:role/external-dns-service-account-oidc-role

For me the issue was that the trust relationship was (correctly) setup using one partition whereas the ServiceAccount was annotated with a different partition, like so:
...
"Principal": {
"Federated": "arn:aws-us-gov:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
...
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::{{ .Values.aws.account }}:role/{{ .Values.aws.roleName }}
Notice arn:aws:iam vs arn:aws-us-gov:iam

Related

EKS: can't see nodes and nodes are not join to the cluster

I read all aws articles. I followed each one by one. But it didn't work any of them. Let me briefly summarize my situation. I created EKS automation with terraform. 1 vpc, 3 public subnets, 3 private subnets, 3 security group, 1 nat gateway(on public), and 2 autoscaled worker node groups. I checked all infra which created with terraform. There are no problem.
My main problem is that after the installation I can't see the nodes and nodes are not join to the cluster. I applied below steps but didn't worked. What should I do? By the way don't tag my question as a duplication I checked all similar questions on stackoverflow. My steps look true but does not work.
kubectl get nodes
No resources found
Before checking node with above command.Firstly I applied below command for setting kubeconfig.
aws eks update-kubeconfig --name eks-DS7h --region us-east-1
Here my kubeconfig:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJfgzsfhadfzasdfrzsd.........
server: https://0F97E579A.gr7.us-east-1.eks.amazonaws.com
name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
contexts:
- context:
cluster: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
user: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
current-context: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
kind: Config
preferences: {}
users:
- name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- --region
- us-east-1
- eks
- get-token
- --cluster-name
- eks-DS7h
command: aws
After this I checked the nodes again but I still get no resource found. Than I try to edit aws-auth. Before the edit I check my user on the terminal where I triggered all terraform steps installation.
aws sts get-caller-identity
{
"UserId": "ASDFGSDFGDGSDGDFHSFDSDC",
"Account": "545153234644",
"Arn": "arn:aws:iam::545153234644:user/white"
}
I took my user info and I added blank mapuser area in aws-auth. But still getting No resources found.
kubectl get cm -n kube-system aws-auth
apiVersion: v1
data:
mapAccounts: |
[]
mapRoles: |
- "groups":
- "system:bootstrappers"
- "system:nodes"
- "system:masters"
"rolearn": "arn:aws:iam::545153234644:role/eks-DS7h22060508195731770000000e"
"username": "system:node:{{EC2PrivateDNSName}}"
mapUsers: "- \"userarn\": \"arn:aws:iam::545153234644:user/white\"\n \"username\":
\"white\"\n \"groups\":\n - \"system:masters\"\n - \"system:nodes\" \n"
kind: ConfigMap
metadata:
creationTimestamp: "2022-06-05T08:20:02Z"
labels:
app.kubernetes.io/managed-by: Terraform
terraform.io/module: terraform-aws-modules.eks.aws
name: aws-auth
namespace: kube-system
resourceVersion: "4976"
uid: b12341-33ff-4f78-af0a-758f88
Oh also when I check EKS cluster on dashboard I see below warning too. I don't know is it relevant or not. I want to share it too maybe it will help.

How to allow an assume role connect from EC2 to EKS on AWS?

I created an EC2 instance and an EKS cluster in the same AWS account.
In order to use the EKS cluster from EC2, I have to grant necessary permissions to it.
I added an instance profile role with some EKS operation permissions. Its role arn is arn:aws:iam::11111111:role/ec2-instance-profile-role(A) on dashboard. But in the EC2 instance, it can be found as arn:aws:sts::11111111:assumed-role/ec2-instance-profile-role/i-00000000(B).
$ aws sts get-caller-identity
{
"Account": "11111111",
"UserId": "AAAAAAAAAAAAAAA:i-000000000000",
"Arn": "arn:aws:sts::11111111:assumed-role/ec2-instance-profile-role/i-00000000"
}
I also created an aws-auth config map to set into Kubernetes' system config in EKS, in order to allow the EC2 instance profile role can be registered and accessible. I tried both A and B to set into the mapRoles, all of them got the same issue. When I run kubectl command on EC2:
$ aws eks --region aws-region update-kubeconfig --name eks-cluster-name
$ kubectl config view --minify
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://xxxxxxxxxxxxxxxxxxxxxxxxxxxx.aw1.aws-region.eks.amazonaws.com
name: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
contexts:
- context:
cluster: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
user: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
name: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
current-context: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
kind: Config
preferences: {}
users:
- name: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- --region
- aws-region
- eks
- get-token
- --cluster-name
- eks-cluster-name
- --role
- arn:aws:sts::11111111:assumed-role/ec2-instance-profile-role/i-00000000
command: aws
env: null
provideClusterInfo: false
$kubectl get svc
error: You must be logged in to the server (Unauthorized)
I also checked the type of the assumed role. It's Service but not AWS.
It seems this type is necessary.
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam:: 333333333333:root" },
"Action": "sts:AssumeRole"
}
}
Terraform aws assume role
But I tried to create a new assume role with AWS type and set it to Kubernetes' aws-auth config map, still the same issue.
How to use it? Do I need to create a new IAM user to use?
- name: external-staging
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- exec
- test-dev
- --
- aws
- eks
- get-token
- --cluster-name
- eksCluster-1234
- --role-arn
- arn:aws:iam::3456789002:role/eks-cluster-admin-role-e65f32f
command: aws-vault
env: null
this config file working for me. it should be role-arn & command: aws-vault

Fixing DataDog agent congestion issues in Amazon EKS cluster

A few months ago I integrated DataDog into my Kubernetes cluster by using a DaemonSet configuration. Since then I've been getting congestion alerts with the following message:
Please tune the hot-shots settings
https://github.com/brightcove/hot-shots#errors
By attempting to follow the docs with my limited Orchestration/DevOps knowledge, what I could gather is that I need to add the following to my DaemonSet config:
spec
.
.
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen
value: "1024"
- name: net.core.wmem_max
value: "4194304"
I attempted to add that configuration piece to one of the auto-deployed DataDog pods directly just to try it out but it hangs indefinitely and doesn't save the configuration (Instead of adding to DaemonSet and risking bringing all agents down).
That hot-shots documentation also mentions that the above sysctl configuration requires unsafe sysctls to be enabled in the nodes that contain the pods:
kubelet --allowed-unsafe-sysctls \
'net.unix.max_dgram_qlen, net.core.wmem_max'
The cluster I am working with is fully deployed with EKS by using the Dashboard in AWS (Little knowledge on how it is configured). The above seems to be indicated for manually deployed and managed cluster.
Why is the configuration I am attempting to apply to a single DataDog agent pod not saving/applying? Is it because it is managed by DaemonSet or is it because it doesn't have the proper unsafe sysctl allowed? Something else?
If I do need to enable the suggested unsafe sysctlon all nodes of my cluster. How do I go about it since the cluster is fully deployed and managed by Amazon EKS?
So we managed to achieve this using a custom launch template with our managed node group and then passing in a custom bootstrap script. This does mean however you need to supply the AMI id yourself and lose the alerts in the console when it is outdated. In Terraform this would look like:
resource "aws_eks_node_group" "group" {
...
launch_template {
id = aws_launch_template.nodes.id
version = aws_launch_template.nodes.latest_version
}
...
}
data "template_file" "bootstrap" {
template = file("${path.module}/files/bootstrap.tpl")
vars = {
cluster_name = aws_eks_cluster.cluster.name
cluster_auth_base64 = aws_eks_cluster.cluster.certificate_authority.0.data
endpoint = aws_eks_cluster.cluster.endpoint
}
}
data "aws_ami" "eks_node" {
owners = ["602401143452"]
most_recent = true
filter {
name = "name"
values = ["amazon-eks-node-1.21-v20211008"]
}
}
resource "aws_launch_template" "nodes" {
...
image_id = data.aws_ami.eks_node.id
user_data = base64encode(data.template_file.bootstrap.rendered)
...
}
Then the bootstrap.hcl file looks like this:
#!/bin/bash
set -o xtrace
systemctl stop kubelet
/etc/eks/bootstrap.sh '${cluster_name}' \
--b64-cluster-ca '${cluster_auth_base64}' \
--apiserver-endpoint '${endpoint}' \
--kubelet-extra-args '"--allowed-unsafe-sysctls=net.unix.max_dgram_qlen"'
The next step is to set up the PodSecurityPolicy, ClusterRole and RoleBinding in your cluster so you can use the securityContext as you described above and then pods in that namespace will be able to run without a SysctlForbidden message.
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: sysctl
spec:
allowPrivilegeEscalation: false
allowedUnsafeSysctls:
- net.unix.max_dgram_qlen
defaultAllowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: allow-sysctl
rules:
- apiGroups:
- policy
resourceNames:
- sysctl
resources:
- podsecuritypolicies
verbs:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: allow-sysctl
namespace: app-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: allow-sysctl
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts:app-namespace
If using the DataDog Helm chart you can set the following values to update the securityContext of the agent. But you will have to update the chart PSP manually to set allowedUnsafeSysctls
datadog:
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen"
value: 512"

What does "eksctl create iamserviceaccount" do under the hood on an EKS cluster?

AWS supports IAM Roles for Service Accounts (IRSA) that allows cluster operators to map AWS IAM Roles to Kubernetes Service Accounts.
To do so, one has to create an iamserviceaccount in an EKS cluster:
eksctl create iamserviceaccount \
--name <AUTOSCALER_NAME> \
--namespace kube-system \
--cluster <CLUSTER_NAME> \
--attach-policy-arn <POLICY_ARN> \
--approve \
--override-existing-serviceaccounts
The problem is that I don't want to use the above eksctl command because I want to declare my infrastructure using terraform.
Does eksctl command do anything other than creating a service account? If it only creates a service account, what is the YAML representation of it?
I am adding my answer here because I stumble upon the same issue, and accepted answer (and other answers above), do not provide full resolution to the issue - no code examples. They are just guidelines which I had to use to research much deeper. There are some issues which is really easy to miss - and without code examples its quite hard to conclude what is happening (especially part related with Conditions/StringEquals while creating IAM role)
The whole purpose of creating a service account which is going to be tied with the role - is possibility of creating aws resources from within cluster (most common case is load balancer, or roles for pushing logs to the cloudwatch).
So, question is how we can do this, using terraform, instead of using eks commands.
What we need to do, is:
create eks oidc (which can be done with terraform)
create AWS IAM role (which can be done with terraform), create and use proper policies
Create k8s service account (needs to be done with kubectl commands - or with terraform using kubernetes resources
Annotate k8s service account with IAM role we created (meaning that we are linking k8s service account with IAM role)
After this setup, our k8s service account will have k8s cluster role and k8s cluster role binding (which will allow that service account to perform actions within the k8s) and, our k8s service account will have IAM role attached to it, which will allow to perform actions outside of the cluster (like creating aws resources)
So lets start with it. Assumption bellow is that your eks cluster is already created with terraform, and we are focusing on creating resources areound that eks cluster necessary for working service account.
Create eks_oidc
### First we need to create tls certificate
data "tls_certificate" "eks-cluster-tls-certificate" {
url = aws_eks_cluster.eks-cluster.identity[0].oidc[0].issuer
}
# After that create oidc
resource "aws_iam_openid_connect_provider" "eks-cluster-oidc" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.eks-cluster-tls-certificate.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.eks-cluster.identity[0].oidc[0].issuer
}
Now, lets create AWS IAM role with all necessary policies.
Terraform declarative code bellow will:
create ALBIngressControllerIAMPolicy policy
create alb-ingress-controller-role role
attach ALBIngressControllerIAMPolicyr policy to alb-ingress-controller-role role
attach already existing AmazonEKS_CNI_Policy policy to the role
Make a note that i used suffixes as alb ingress controller here, because that is primary use of my role from within the cluster. You can change the name of policy of the role or you can change permission access for the policy as well in dependency of what you are planing to do with it.
data "aws_caller_identity" "current" {}
locals {
account_id = data.aws_caller_identity.current.account_id
eks_oidc = replace(replace(aws_eks_cluster.eks-cluster.endpoint, "https://", ""), "/\\..*$/", "")
}
# Policy which will allow us to create application load balancer from inside of cluster
resource "aws_iam_policy" "ALBIngressControllerIAMPolicy" {
name = "ALBIngressControllerIAMPolicy"
description = "Policy which will be used by role for service - for creating alb from within cluster by issuing declarative kube commands"
policy = jsonencode({
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Action = [
"elasticloadbalancing:ModifyListener",
"wafv2:AssociateWebACL",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:DescribeInstances",
"wafv2:GetWebACLForResource",
"elasticloadbalancing:RegisterTargets",
"iam:ListServerCertificates",
"wafv2:GetWebACL",
"elasticloadbalancing:SetIpAddressType",
"elasticloadbalancing:DeleteLoadBalancer",
"elasticloadbalancing:SetWebAcl",
"ec2:DescribeInternetGateways",
"elasticloadbalancing:DescribeLoadBalancers",
"waf-regional:GetWebACLForResource",
"acm:GetCertificate",
"shield:DescribeSubscription",
"waf-regional:GetWebACL",
"elasticloadbalancing:CreateRule",
"ec2:DescribeAccountAttributes",
"elasticloadbalancing:AddListenerCertificates",
"elasticloadbalancing:ModifyTargetGroupAttributes",
"waf:GetWebACL",
"iam:GetServerCertificate",
"wafv2:DisassociateWebACL",
"shield:GetSubscriptionState",
"ec2:CreateTags",
"elasticloadbalancing:CreateTargetGroup",
"ec2:ModifyNetworkInterfaceAttribute",
"elasticloadbalancing:DeregisterTargets",
"elasticloadbalancing:DescribeLoadBalancerAttributes",
"ec2:RevokeSecurityGroupIngress",
"elasticloadbalancing:DescribeTargetGroupAttributes",
"shield:CreateProtection",
"acm:DescribeCertificate",
"elasticloadbalancing:ModifyRule",
"elasticloadbalancing:AddTags",
"elasticloadbalancing:DescribeRules",
"ec2:DescribeSubnets",
"elasticloadbalancing:ModifyLoadBalancerAttributes",
"waf-regional:AssociateWebACL",
"tag:GetResources",
"ec2:DescribeAddresses",
"ec2:DeleteTags",
"shield:DescribeProtection",
"shield:DeleteProtection",
"elasticloadbalancing:RemoveListenerCertificates",
"tag:TagResources",
"elasticloadbalancing:RemoveTags",
"elasticloadbalancing:CreateListener",
"elasticloadbalancing:DescribeListeners",
"ec2:DescribeNetworkInterfaces",
"ec2:CreateSecurityGroup",
"acm:ListCertificates",
"elasticloadbalancing:DescribeListenerCertificates",
"ec2:ModifyInstanceAttribute",
"elasticloadbalancing:DeleteRule",
"cognito-idp:DescribeUserPoolClient",
"ec2:DescribeInstanceStatus",
"elasticloadbalancing:DescribeSSLPolicies",
"elasticloadbalancing:CreateLoadBalancer",
"waf-regional:DisassociateWebACL",
"elasticloadbalancing:DescribeTags",
"ec2:DescribeTags",
"elasticloadbalancing:*",
"elasticloadbalancing:SetSubnets",
"elasticloadbalancing:DeleteTargetGroup",
"ec2:DescribeSecurityGroups",
"iam:CreateServiceLinkedRole",
"ec2:DescribeVpcs",
"ec2:DeleteSecurityGroup",
"elasticloadbalancing:DescribeTargetHealth",
"elasticloadbalancing:SetSecurityGroups",
"elasticloadbalancing:DescribeTargetGroups",
"shield:ListProtections",
"elasticloadbalancing:ModifyTargetGroup",
"elasticloadbalancing:DeleteListener"
],
Resource = "*"
}
]
})
}
# Create IAM role
resource "aws_iam_role" "alb-ingress-controller-role" {
name = "alb-ingress-controller"
assume_role_policy = <<POLICY
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "${aws_iam_openid_connect_provider.eks-cluster-oidc.arn}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${replace(aws_iam_openid_connect_provider.eks-cluster-oidc.url, "https://", "")}:sub": "system:serviceaccount:kube-system:alb-ingress-controller",
"${replace(aws_iam_openid_connect_provider.eks-cluster-oidc.url, "https://", "")}:aud": "sts.amazonaws.com"
}
}
}
]
}
POLICY
depends_on = [aws_iam_openid_connect_provider.eks-cluster-oidc]
tags = {
"ServiceAccountName" = "alb-ingress-controller"
"ServiceAccountNameSpace" = "kube-system"
}
}
# Attach policies to IAM role
resource "aws_iam_role_policy_attachment" "alb-ingress-controller-role-ALBIngressControllerIAMPolicy" {
policy_arn = aws_iam_policy.ALBIngressControllerIAMPolicy.arn
role = aws_iam_role.alb-ingress-controller-role.name
depends_on = [aws_iam_role.alb-ingress-controller-role]
}
resource "aws_iam_role_policy_attachment" "alb-ingress-controller-role-AmazonEKS_CNI_Policy" {
role = aws_iam_role.alb-ingress-controller-role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
depends_on = [aws_iam_role.alb-ingress-controller-role]
}
After executing terraform above, you have successfully created terraform part of the resources. Now we need to create a k8s service account and bind IAM role with that service account.
Creating cluster role, cluster role binding and service account
You can use
https://raw.githubusercontent.com/kubernetes-sigs/aws-alb-ingress-controller/master/docs/examples/rbac-role.yaml
directly (from the master branch), but having in mind that we need to annotate the iam arn, i have tendency to download this file, update it and store it as updated within my kubectl config files.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/name: alb-ingress-controller
name: alb-ingress-controller
rules:
- apiGroups:
- ""
- extensions
resources:
- configmaps
- endpoints
- events
- ingresses
- ingresses/status
- services
- pods/status
verbs:
- create
- get
- list
- update
- watch
- patch
- apiGroups:
- ""
- extensions
resources:
- nodes
- pods
- secrets
- services
- namespaces
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/name: alb-ingress-controller
name: alb-ingress-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: alb-ingress-controller
subjects:
- kind: ServiceAccount
name: alb-ingress-controller
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/name: alb-ingress-controller
name: alb-ingress-controller
namespace: kube-system
annotations:
eks.amazonaws.com/role-arn: <ARN OF YOUR ROLE HERE>
...
At the bottom of this file, you will notice annotation where you will need to place your ANR role.
Double check
And that would be it. After that you have a k8s service account which is connected with iam role.
Check with:
kubectl get sa -n kube-system
kubectl describe sa alb-ingress-controller -n kube-system
And you should get output similar to this (annotations is the most important part, because it confirms the attachment of iam role):
Name: alb-ingress-controller
Namespace: kube-system
Labels: app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=alb-ingress-controller
Annotations: eks.amazonaws.com/role-arn: <YOUR ANR WILL BE HERE>
meta.helm.sh/release-name: testrelease
meta.helm.sh/release-namespace: default
Image pull secrets: <none>
Mountable secrets: alb-ingress-controller-token-l4pd8
Tokens: alb-ingress-controller-token-l4pd8
Events: <none>
From now on, you can use this service to manage internal k8s resources and external which are allowed by the policies you attached.
In my case, as mentioned before, I used it (beside other things) for creation of alb ingress controller and load balancer, hence all of the prefixes with "alb-ingress"
First, you should define IAM role in Terraform.
Second, you should configure aws-auth configmap in Kubernetes to map the IAM role to Kubernetes user or serviceaccount. You can do that in Terraform using Kubernetes provider.
There is already a Terraform module terraform-aws-eks which manages all aspects of EKS cluster. You may take some ideas from it.
After Vasili Angapov's helps, now I can answer the question:
Yes It does more than just creating a service account. It does three things:
It Creates an IAM role.
It attaches the desired iam-policy (--attach-policy-arn
<POLICY_ARN>) to the created IAM role.
It creates a new kubernetes service account annotated with the arn of the created IAM role.
Now It's easy to declare the above steps using kubernetes and aws providers in terraform.
A role created for that purpose looks like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<account-id>:oidc-provider/oidc.eks.<region>.amazonaws.com/id/<oidc-id>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.<region>.amazonaws.com/id/<oidc-id>": "system:serviceaccount:<kube-serviceaccount-namespace>:<kube-serviceaccount-name>"
}
}
}
]
}
I highly recommend you to use iam_assumable_role_admin terraform module for creating this IAM Role for you.
Docs
Example

Kubernetes/kops: error attaching EBS volume to instance. You are not authorized to perform this operation. Error 403

I tested kubernetes deployment with EBS volume mounting on AWS cluster provisioned by kops. This is deployment yml file:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: helloworld-deployment-volume
spec:
replicas: 1
template:
metadata:
labels:
app: helloworld
spec:
containers:
- name: k8s-demo
image: wardviaene/k8s-demo
ports:
- name: nodejs-port
containerPort: 3000
volumeMounts:
- mountPath: /myvol
name: myvolume
volumes:
- name: myvolume
awsElasticBlockStore:
volumeID: <volume_id>
After kubectl create -f <path_to_this_yml>, I got the following message in pod description:
Attach failed for volume "myvolume" : Error attaching EBS volume "XXX" to instance "YYY": "UnauthorizedOperation: You are not authorized to perform this operation. status code: 403
Looks like this is just a permission issue. Ok, I checked policy for node role IAM -> Roles -> nodes.<my_domain> and found that there where no actions which allow to manipulate volumes, there was only ec2:DescribeInstances action by default. So I added AttachVolume and DetachVolume actions:
{
"Sid": "kopsK8sEC2NodePerms",
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:AttachVolume",
"ec2:DetachVolume"
],
"Resource": [
"*"
]
},
And this didn't help. I'm still getting that error:
Attach failed for volume "myvolume" : Error attaching EBS volume "XXX" to instance "YYY": "UnauthorizedOperation: You are not authorized to perform this operation.
Am I missing something?
I found a solution. It's described here.
In kops 1.8.0-beta.1, master node requires you to tag the AWS volume with:
KubernetesCluster: <clustername-here>
So it's necessary to create EBS volume with that tag by using awscli:
aws ec2 create-volume --size 10 --region eu-central-1 --availability-zone eu-central-1a --volume-type gp2 --tag-specifications 'ResourceType=volume,Tags=[{Key=KubernetesCluster,Value=<clustername-here>}]'
or you can tag it by manually in EC2 -> Volumes -> Your volume -> Tags
That's it.
EDIT:
The right cluster name can be found within EC2 instances tags which are part of cluster. Key is the same: KubernetesCluster.