AWS SSO authorization for EKS fails to call sts:AssumeRole - amazon-web-services

I'm migrating to AWS SSO for cli access, which has worked for everything except for kubectl so far.
While troubleshooting it I followed a few guides, which means I ended up with some cargo-cult behaviour, and I'm obviously missing something in my mental model.
aws sts get-caller-identity
{
"UserId": "<redacted>",
"Account": "<redacted>",
"Arn": "arn:aws:sts::<redacted>:assumed-role/AWSReservedSSO_DeveloperReadonly_a6a1426b0fdf9f87/<my username>"
}
kubectl get pods
An error occurred (AccessDenied) when calling the AssumeRole operation: User: arn:aws:sts:::assumed-role/AWSReservedSSO_DeveloperReadonly_a6a1426b0fdf9f87/ is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam:::role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_DeveloperReadonly_a6a1426b0fdf9f87
It's amusing that it seems to be trying to assume the same role that it's already using, but I'm not sure how to fix it.
~/.aws/config (subset - I have other profiles, but they aren't relevant here)
[default]
region = us-east-2
output = json
[profile default]
sso_start_url = https://<redacted>.awsapps.com/start
sso_account_id = <redacted>
sso_role_name = DeveloperReadonly
region = us-east-2
sso_region = us-east-2
output = json
~/.kube/config (with clusters removed)
apiVersion: v1
contexts:
- context:
cluster: arn:aws:eks:us-east-2:<redacted>:cluster/foo
user: ro
name: ro
current-context: ro
kind: Config
preferences: {}
users:
- name: ro
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- --region
- us-east-2
- eks
- get-token
- --cluster-name
- foo
- --role
- arn:aws:iam::<redacted>:role/aws-reserved/sso.amazonaws.com/us-east-2/AWSReservedSSO_DeveloperReadonly_a6a1426b0fdf9f87
command: aws
env: null
aws-auth mapRoles snippet
- rolearn: arn:aws:iam::<redacted>:role/AWSReservedSSO_DeveloperReadonly_a6a1426b0fdf9f87
username: "devread:{{SessionName}}"
groups:
- view
What obvious thing am I missing? I've reviewed the other stackoverflow posts with similar issues, but none had the arn:aws:sts:::assumed-role -> arn:aws:iam:::role path.

.aws/config had a subtle error - [profile default] isn't meaningful, so the two blocks should have been merged into [default]. Only the non-default profiles should have profile in the name.
[default]
sso_start_url = https://<redacted>.awsapps.com/start
sso_account_id = <redacted>
sso_role_name = DeveloperReadonly
region = us-east-2
sso_region = us-east-2
output = json
[profile rw]
sso_start_url = https://<redacted>.awsapps.com/start
sso_account_id = <redacted>
sso_role_name = DeveloperReadWrite
region = us-east-2
sso_region = us-east-2
output = json
I also changed .kube/config to get the token based on the profile instead of naming the role explicitly. This fixed the AssumeRole failing since it used the existing role.
apiVersion: v1
contexts:
- context:
cluster: arn:aws:eks:us-east-2:<redacted>:cluster/foo
user: ro
name: ro
current-context: ro
kind: Config
preferences: {}
users:
- name: ro
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- --region
- us-east-2
- eks
- get-token
- --cluster-name
- foo
- --profile
- default
command: aws
env: null
I can now run kubectl config use-context ro or the other profiles I've defined (omitted for brevity).
On a related note, I had some trouble getting an older terraform version to work since the s3 backend didn't handle sso. aws-vault solved this for me

Related

How can I use was sso login and sso-sessions with serverless-better-credentials serverless plugin?

I get ProcessCredentialsProviderFailure: Profile default not found when trying to run ProcessCredentialsProviderFailure: Profile default not found which does not seem right sense my ~/.aws/config looks like this:
[sso-session aphexlog]
sso_start_url = https://aphexlog.awsapps.com/start
sso_region = us-west-2
sso_registration_scopes = sso:account:access
[profile elevator-robot]
sso_session = aphexlog
sso_account_id = 12345678910
sso_role_name = AWSAdministratorAccess
region = us-east-1
output = json
serverless.json:
service: dog
frameworkVersion: '3'
provider:
name: aws
runtime: nodejs18.x
functions:
dog:
handler: index.handler
plugins:
- serverless-better-credentials
steps to reproduce:
run npm i --save-dev serverless-better-credentials
run aws sso login --profile elevator-robot
run serverless info --aws-profile elevator-robot
then you get the error
However, if I just export all my env variables (secret keys) then it works fine

EKS: can't see nodes and nodes are not join to the cluster

I read all aws articles. I followed each one by one. But it didn't work any of them. Let me briefly summarize my situation. I created EKS automation with terraform. 1 vpc, 3 public subnets, 3 private subnets, 3 security group, 1 nat gateway(on public), and 2 autoscaled worker node groups. I checked all infra which created with terraform. There are no problem.
My main problem is that after the installation I can't see the nodes and nodes are not join to the cluster. I applied below steps but didn't worked. What should I do? By the way don't tag my question as a duplication I checked all similar questions on stackoverflow. My steps look true but does not work.
kubectl get nodes
No resources found
Before checking node with above command.Firstly I applied below command for setting kubeconfig.
aws eks update-kubeconfig --name eks-DS7h --region us-east-1
Here my kubeconfig:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJfgzsfhadfzasdfrzsd.........
server: https://0F97E579A.gr7.us-east-1.eks.amazonaws.com
name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
contexts:
- context:
cluster: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
user: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
current-context: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
kind: Config
preferences: {}
users:
- name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- --region
- us-east-1
- eks
- get-token
- --cluster-name
- eks-DS7h
command: aws
After this I checked the nodes again but I still get no resource found. Than I try to edit aws-auth. Before the edit I check my user on the terminal where I triggered all terraform steps installation.
aws sts get-caller-identity
{
"UserId": "ASDFGSDFGDGSDGDFHSFDSDC",
"Account": "545153234644",
"Arn": "arn:aws:iam::545153234644:user/white"
}
I took my user info and I added blank mapuser area in aws-auth. But still getting No resources found.
kubectl get cm -n kube-system aws-auth
apiVersion: v1
data:
mapAccounts: |
[]
mapRoles: |
- "groups":
- "system:bootstrappers"
- "system:nodes"
- "system:masters"
"rolearn": "arn:aws:iam::545153234644:role/eks-DS7h22060508195731770000000e"
"username": "system:node:{{EC2PrivateDNSName}}"
mapUsers: "- \"userarn\": \"arn:aws:iam::545153234644:user/white\"\n \"username\":
\"white\"\n \"groups\":\n - \"system:masters\"\n - \"system:nodes\" \n"
kind: ConfigMap
metadata:
creationTimestamp: "2022-06-05T08:20:02Z"
labels:
app.kubernetes.io/managed-by: Terraform
terraform.io/module: terraform-aws-modules.eks.aws
name: aws-auth
namespace: kube-system
resourceVersion: "4976"
uid: b12341-33ff-4f78-af0a-758f88
Oh also when I check EKS cluster on dashboard I see below warning too. I don't know is it relevant or not. I want to share it too maybe it will help.

How to allow an assume role connect from EC2 to EKS on AWS?

I created an EC2 instance and an EKS cluster in the same AWS account.
In order to use the EKS cluster from EC2, I have to grant necessary permissions to it.
I added an instance profile role with some EKS operation permissions. Its role arn is arn:aws:iam::11111111:role/ec2-instance-profile-role(A) on dashboard. But in the EC2 instance, it can be found as arn:aws:sts::11111111:assumed-role/ec2-instance-profile-role/i-00000000(B).
$ aws sts get-caller-identity
{
"Account": "11111111",
"UserId": "AAAAAAAAAAAAAAA:i-000000000000",
"Arn": "arn:aws:sts::11111111:assumed-role/ec2-instance-profile-role/i-00000000"
}
I also created an aws-auth config map to set into Kubernetes' system config in EKS, in order to allow the EC2 instance profile role can be registered and accessible. I tried both A and B to set into the mapRoles, all of them got the same issue. When I run kubectl command on EC2:
$ aws eks --region aws-region update-kubeconfig --name eks-cluster-name
$ kubectl config view --minify
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: DATA+OMITTED
server: https://xxxxxxxxxxxxxxxxxxxxxxxxxxxx.aw1.aws-region.eks.amazonaws.com
name: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
contexts:
- context:
cluster: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
user: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
name: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
current-context: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
kind: Config
preferences: {}
users:
- name: arn:aws:eks:aws-region:11111111:cluster/eks-cluster-name
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- --region
- aws-region
- eks
- get-token
- --cluster-name
- eks-cluster-name
- --role
- arn:aws:sts::11111111:assumed-role/ec2-instance-profile-role/i-00000000
command: aws
env: null
provideClusterInfo: false
$kubectl get svc
error: You must be logged in to the server (Unauthorized)
I also checked the type of the assumed role. It's Service but not AWS.
It seems this type is necessary.
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam:: 333333333333:root" },
"Action": "sts:AssumeRole"
}
}
Terraform aws assume role
But I tried to create a new assume role with AWS type and set it to Kubernetes' aws-auth config map, still the same issue.
How to use it? Do I need to create a new IAM user to use?
- name: external-staging
user:
exec:
apiVersion: client.authentication.k8s.io/v1alpha1
args:
- exec
- test-dev
- --
- aws
- eks
- get-token
- --cluster-name
- eksCluster-1234
- --role-arn
- arn:aws:iam::3456789002:role/eks-cluster-admin-role-e65f32f
command: aws-vault
env: null
this config file working for me. it should be role-arn & command: aws-vault

Not authorized to perform sts:AssumeRoleWithWebIdentity- 403

I have been trying to run an external-dns pod using the guide provided by k8s-sig group. I have followed every step of the guide, and getting the below error.
time="2021-02-27T13:27:20Z" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 87a3ca86-ceb0-47be-8f90-25d0c2de9f48"
I had created AWS IAM policy using Terraform, and it was successfully created. Except IAM Role for service account for which I had used eksctl, everything else has been spun via Terraform.
But then I got hold of this article which says creating AWS IAM policy using awscli would eliminate this error. So I deleted the policy created using Terraform, and recreated it with awscli. Yet, it is throwing the same error error.
Below is my external dns yaml file.
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-dns
# If you're using Amazon EKS with IAM Roles for Service Accounts, specify the following annotation.
# Otherwise, you may safely omit it.
annotations:
# Substitute your account ID and IAM service role name below.
eks.amazonaws.com/role-arn: arn:aws:iam::268xxxxxxx:role/eksctl-ats-Eks1-addon-iamserviceaccoun-Role1-WMLL93xxxx
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: external-dns
rules:
- apiGroups: [""]
resources: ["services","endpoints","pods"]
verbs: ["get","watch","list"]
- apiGroups: ["extensions","networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get","watch","list"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: external-dns-viewer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: external-dns
subjects:
- kind: ServiceAccount
name: external-dns
namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: external-dns
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: external-dns
template:
metadata:
labels:
app: external-dns
spec:
serviceAccountName: external-dns
containers:
- name: external-dns
image: k8s.gcr.io/external-dns/external-dns:v0.7.6
args:
- --source=service
- --source=ingress
- --domain-filter=xyz.com # will make ExternalDNS see only the hosted zones matching provided domain, omit to process all available hosted zones
- --provider=aws
- --policy=upsert-only # would prevent ExternalDNS from deleting any records, omit to enable full synchronization
- --aws-zone-type=public # only look at public hosted zones (valid values are public, private or no value for both)
- --registry=txt
- --txt-owner-id=Z0471542U7WSPZxxxx
securityContext:
fsGroup: 65534 # For ExternalDNS to be able to read Kubernetes and AWS token files
I am scratching my head as there is no proper solution to this error anywhere in the net. Hoping to find a solution to this issue in this forum.
End result must show something like below and fill up records in hosted zone.
time="2020-05-05T02:57:31Z" level=info msg="All records are already up to date"
I also struggled with this error.
The problem was in the definition of the trust relationship.
You can see in some offical aws tutorials (like this) the following setup:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:sub": "system:serviceaccount:<my-namespace>:<my-service-account>"
}
}
}
]
}
Option 1 for failure
My problem was that I passed the a wrong value for my-service-account at the end of ${OIDC_PROVIDER}:sub in the Condition part.
Option 2 for failure
After the previous fix - I still faced the same error - it was solved by following this aws tutorial which shows the output of using the eksctl with the command below:
eksctl create iamserviceaccount \
--name my-serviceaccount \
--namespace <your-ns> \
--cluster <your-cluster-name> \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
--approve
When you look at the output in the trust relationship tab in the AWS web console - you can see that an additional condition was added with the postfix of :aud and the value of sts.amazonaws.com:
So this need to be added after the "${OIDC_PROVIDER}:sub" condition.
I was able to get help from the Kubernetes Slack (shout out to #Rob Del) and this is what we came up with. There's nothing wrong with the k8s rbac from the article, the issue is the way the IAM role is written. I am using Terraform v0.12.24, but I believe something similar to the following .tf should work for Terraform v0.14:
data "aws_caller_identity" "current" {}
resource "aws_iam_role" "external_dns_role" {
name = "external-dns"
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": format(
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:%s",
replace(
"${aws_eks_cluster.<YOUR_CLUSTER_NAME>.identity[0].oidc[0].issuer}",
"https://",
"oidc-provider/"
)
)
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
format(
"%s:sub",
trimprefix(
"${aws_eks_cluster.<YOUR_CLUSTER_NAME>.identity[0].oidc[0].issuer}",
"https://"
)
) : "system:serviceaccount:default:external-dns"
}
}
}
]
})
}
The above .tf assume you created your eks cluster using terraform and that you use the rbac manifest from the external-dns tutorial.
I have a few possibilities here.
Before anything else, does your cluster have an OIDC provider associated with it? IRSA won't work without it.
You can check that in the AWS console, or via the CLI with:
aws eks describe-cluster --name {name} --query "cluster.identity.oidc.issuer"
First
Delete the iamserviceaccount, recreate it, remove the ServiceAccount definition from your ExternalDNS manfiest (the entire first section) and re-apply it.
eksctl delete iamserviceaccount --name {name} --namespace {namespace} --cluster {cluster}
eksctl create iamserviceaccount --name {name} --namespace {namespace} --cluster
{cluster} --attach-policy-arn {policy-arn} --approve --override-existing-serviceaccounts
kubectl apply -n {namespace} -f {your-externaldns-manifest.yaml}
It may be that there is some conflict going on as you have overwritten what you created with eksctl createiamserviceaccount by also specifying a ServiceAccount in your ExternalDNS manfiest.
Second
Upgrade your cluster to v1.19 (if it's not there already):
eksctl upgrade cluster --name {name} will show you what will be done;
eksctl upgrade cluster --name {name} --approve will do it
Third
Some documentation suggests that in addition to setting securityContext.fsGroup: 65534, you also need to set securityContext.runAsUser: 0.
I've been struggling with a similar issue after following the setup suggested here
I ended up with the exception below in the deploy logs.
time="2021-05-10T06:40:17Z" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 3fda6c69-2a0a-4bc9-b478-521b5131af9b"
time="2021-05-10T06:41:20Z" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 7d3e07a2-c514-44fa-8e79-d49314d9adb6"
In my case, it was an issue with wrong Service account name mapped to the new role created.
Here is a step by step approach to get this done without much hiccups.
Create the IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"route53:ChangeResourceRecordSets"
],
"Resource": [
"arn:aws:route53:::hostedzone/*"
]
},
{
"Effect": "Allow",
"Action": [
"route53:ListHostedZones",
"route53:ListResourceRecordSets"
],
"Resource": [
"*"
]
}
]
}
Create the IAM role and the service account for your EKS cluster.
eksctl create iamserviceaccount \
--name external-dns-sa-eks \
--namespace default \
--cluster aecops-grpc-test \
--attach-policy-arn arn:aws:iam::xxxxxxxx:policy/external-dns-policy-eks \
--approve
--override-existing-serviceaccounts
Created new hosted zone.
aws route53 create-hosted-zone --name "hosted.domain.com." --caller-reference "grpc-endpoint-external-dns-test-$(date +%s)"
Deploy ExternalDNS, after creating the Cluster role and Cluster role binding to the previously created service account.
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: external-dns
rules:
- apiGroups: [""]
resources: ["services","endpoints","pods"]
verbs: ["get","watch","list"]
- apiGroups: ["extensions","networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get","watch","list"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: external-dns-viewer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: external-dns
subjects:
- kind: ServiceAccount
name: external-dns-sa-eks
namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: external-dns
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: external-dns
template:
metadata:
labels:
app: external-dns
# If you're using kiam or kube2iam, specify the following annotation.
# Otherwise, you may safely omit it.
annotations:
iam.amazonaws.com/role: arn:aws:iam::***********:role/eksctl-eks-cluster-name-addon-iamserviceacco-Role1-156KP94SN7D7
spec:
serviceAccountName: external-dns-sa-eks
containers:
- name: external-dns
image: k8s.gcr.io/external-dns/external-dns:v0.7.6
args:
- --source=service
- --source=ingress
- --domain-filter=hosted.domain.com. # will make ExternalDNS see only the hosted zones matching provided domain, omit to process all available hosted zones
- --provider=aws
- --policy=upsert-only # would prevent ExternalDNS from deleting any records, omit to enable full synchronization
- --aws-zone-type=public # only look at public hosted zones (valid values are public, private or no value for both)
- --registry=txt
- --txt-owner-id=my-hostedzone-identifier
securityContext:
fsGroup: 65534 # For ExternalDNS to be able to read Kubernetes and AWS token files
Update Ingress resource with the domain name and reapply the manifest.
For ingress objects, ExternalDNS will create a DNS record based on the host specified for the ingress object.
- host: myapp.hosted.domain.com
Validate new records created.
BASH-3.2$ aws route53 list-resource-record-sets --output json
--hosted-zone-id "/hostedzone/Z065*********" --query "ResourceRecordSets[?Name == 'hosted.domain.com..']|[?Type == 'A']"
[
{
"Name": "myapp.hosted.domain.com..",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "ZCT6F*******",
"DNSName": "****************.elb.ap-southeast-2.amazonaws.com.",
"EvaluateTargetHealth": true
}
} ]
In our case this issue occurred when using the Terraform module to create the eks cluster, and eksctl to create the iamserviceaccount for the aws-load-balancer controller. It all works fine the first go-round. But if you do a terraform destroy, you need to do some cleanup, like delete the CloudFormation script created by eksctl. Somehow things got crossed, and the CloudTrail was passing along a resource role that was no longer valid. So check the annotation of the service account to ensure it's valid, and update it if necessary. Then in my case I deleted and redeployed the aws-load-balancer-controller
%> kubectl describe serviceaccount aws-load-balancer-controller -n kube-system
Name: aws-load-balancer-controller
Namespace: kube-system
Labels: app.kubernetes.io/managed-by=eksctl
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::212222224610:role/eksctl-ch-test-addon-iamserviceaccou-Role1-JQL4R3JM7I1A
Image pull secrets: <none>
Mountable secrets: aws-load-balancer-controller-token-b8hw7
Tokens: aws-load-balancer-controller-token-b8hw7
Events: <none>
%>
%> kubectl annotate --overwrite serviceaccount aws-load-balancer-controller eks.amazonaws.com/role-arn='arn:aws:iam::212222224610:role/eksctl-ch-test-addon-iamserviceaccou-Role1-17A92GGXZRY6O' -n kube-system
In my case, I was able to attach the oidc role with route53 permissions policy and that resolved the error.
https://medium.com/swlh/amazon-eks-setup-external-dns-with-oidc-provider-and-kube2iam-f2487c77b2a1
and then with the external-dns service account used that instead of the cluster role.
annotations:
# # Substitute your account ID and IAM service role name below.
eks.amazonaws.com/role-arn: arn:aws:iam::<account>:role/external-dns-service-account-oidc-role
For me the issue was that the trust relationship was (correctly) setup using one partition whereas the ServiceAccount was annotated with a different partition, like so:
...
"Principal": {
"Federated": "arn:aws-us-gov:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
...
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::{{ .Values.aws.account }}:role/{{ .Values.aws.roleName }}
Notice arn:aws:iam vs arn:aws-us-gov:iam

Authenticate an AWS SQS scaler in Keda

I have a Keda deployment that I've been trying to get to work for about a month now. At the moment, my scaler looks like this:
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: {service-name}-scaler
spec:
scaleTargetRef:
deploymentName: {service-name}
containerName: {service-name}
pollingInterval: 30
cooldownPeriod: 600
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: aws-sqs-queue
authenticationRef:
name: keda-trigger-authentication
metadata:
queueURL: https://sqs.ap-northeast-1.amazonaws.com/{AWS ID}/{Queue-name}
queueLength: "1"
awsRegion: "ap-northeast-1"
identityOwner: pod
The associated trigger authentication and secret are:
apiVersion: v1
kind: Secret
metadata:
name: keda-secrets
data:
AWS_ACCESS_KEY_ID: {base64-encoded-string}
AWS_SECRET_ACCESS_KEY: {base64-encoded-string}
KEDA_ROLE_ARN: {base64-encoded-string}
---
apiVersion: keda.k8s.io/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-authentication
spec:
env:
- parameter: awsRegion
name: AWS_REGION
- parameter: awsAccessKeyID
name: AWS_ACCESS_KEY_ID
- parameter: awsSecretAccessKey
name: AWS_SECRET_ACCESS_KEY
- parameter: awsRoleArn
name: KEDA_ROLE_ARN
secretTargetRef:
- parameter: awsRoleArn
name: keda-secrets
key: KEDA_ROLE_ARN
I understand that the KEDA_ROLE_ARN value is repeated here; I left both for debugging purposes. The order of deploying this is as follows:
Install common environment variables (this is where the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and KEDA_ROLE_ARN values are stored. The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY values are listed as AWS_ACCESS_KEY_ID_ASSUME and AWS_SECRET_ACCESS_KEY_ASSUME respectively in the file and will assume their appropriate values on the container. Again, these are duplicated for debugging purposes. I would prefer to use these values rather than a separate secret.
Install Keda pods with Helm
Deploy the keda-secrets secret and the keda-trigger-authentication trigger authentication
Deploy the container that should be scaled. This is where the AWS_ACCESS_KEY_ID_ASSUME value will assume the name of AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY_ASSUME value will assume the name of AWS_SECRET_ACCESS_KEY and where the AWS_REGION value is defined.
The scaled object is deployed
For some reason, I keep getting an error from AWS when the scaler attempts to scale saying that there are no credential providers in the chain. It appears that the AWS credentials are not being sent. What am I doing wrong here?
I will show you two ways to successfully scale deployment based on AWS SQS
First way : Using AWS IAM role attached to node
If your IAM role (node role) has permission to SQS then accessing SQS becomes easier you just have to change identityOwner: pod field to identityOwner: operator so that KEDA can use node role to access AWS SQS
Sample ScaledObject file with SQS trigger
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: aws-sqs-queue-scaledobject
namespace: default
spec:
scaleTargetRef:
name: test-deployment
minReplicaCount: 0
maxReplicaCount: 2
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/3243234432432/Queue
queueLength: "5"
awsRegion: "us-east-1"
identityOwner: operator
Second way: Using IAM user
In this approach, we need to create below objects
Create IAM user in AWS.
Create secret in Kubernetes.
Create TriggerAuthentication in Kubernetes.
Create scaledObject in Kubernetes.
Create IAM user and give SQS permissions to this IAM user.
first encode IAM user Access Key and Secret key using base64 which will be required while creating Kubernetes secret.
Create secret
apiVersion: v1
kind: Secret
metadata:
name: test-secrets
namespace: default
data:
AWS_ACCESS_KEY_ID: <base64-encoded-key>
AWS_SECRET_ACCESS_KEY: <base64-encoded-secret-key>
Create TriggerAuthentication this will be used in scaledObject
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-auth-aws-credentials
namespace: default
spec:
secretTargetRef:
- parameter: awsAccessKeyID # Required.
name: test-secrets # Required.
key: AWS_ACCESS_KEY_ID # Required.
- parameter: awsSecretAccessKey # Required.
name: test-secrets # Required.
key: AWS_SECRET_ACCESS_KEY # Required.
Create scaledObject to map keda with deployment you want to scale based on SQS trigger
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: aws-sqs-queue-scaledobject
namespace: default
spec:
scaleTargetRef:
name: test-deployment
minReplicaCount: 0
maxReplicaCount: 2
triggers:
- type: aws-sqs-queue
authenticationRef:
name: keda-trigger-auth-aws-credentials
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/012345678912/Queue
queueLength: "5"
awsRegion: "us-east-1"