AWS EKS Fargate logging to AWS Cloudwatch: log groups are not creating - amazon-web-services

I used official procedure from AWS and this one to enable logging.
Here is yaml files I've applied:
---
kind: Namespace
apiVersion: v1
metadata:
name: aws-observability
labels:
aws-observability: enabled
---
kind: ConfigMap
apiVersion: v1
metadata:
name: aws-logging
namespace: aws-observability
data:
flb_log_cw: "true"
output.conf: |
[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name fluent-bit-cloudwatch
log_stream_prefix from-fluent-bit-
auto_create_group true
log_key log
parsers.conf: |
[PARSER]
Name crio
Format Regex
Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>P|F) (?<log>.*)$
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
filters.conf: |
[FILTER]
Name parser
Match *
Key_name log
Parser crio
Inside the pod I can see that logging was enabled:
apiVersion: v1
kind: Pod
metadata:
annotations:
CapacityProvisioned: 2vCPU 4GB
Logging: LoggingEnabled
kubectl.kubernetes.io/restartedAt: "2023-01-17T19:31:20+01:00"
kubernetes.io/psp: eks.privileged
creationTimestamp: "2023-01-17T18:31:28Z"
Logs exists inside the container:
kubectl logs dev-768647846c-hbmv7 -n dev-fargate
But in AWS CloudWatch log groups are not created, even for fluent-bit itself
From the pod cli I can create log groups in AWS Cloudwatch, so the permissions are ok
I also tried cloudwatch instead of cloudwatch_logs plugin, but no luck

I've solved my issue.
The tricky thing is: IAM policy must be attached to the default pod execution role which created automatically with the namespace and it has no relation to the service account \ custom pod execution role

Related

EKS: can't see nodes and nodes are not join to the cluster

I read all aws articles. I followed each one by one. But it didn't work any of them. Let me briefly summarize my situation. I created EKS automation with terraform. 1 vpc, 3 public subnets, 3 private subnets, 3 security group, 1 nat gateway(on public), and 2 autoscaled worker node groups. I checked all infra which created with terraform. There are no problem.
My main problem is that after the installation I can't see the nodes and nodes are not join to the cluster. I applied below steps but didn't worked. What should I do? By the way don't tag my question as a duplication I checked all similar questions on stackoverflow. My steps look true but does not work.
kubectl get nodes
No resources found
Before checking node with above command.Firstly I applied below command for setting kubeconfig.
aws eks update-kubeconfig --name eks-DS7h --region us-east-1
Here my kubeconfig:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJfgzsfhadfzasdfrzsd.........
server: https://0F97E579A.gr7.us-east-1.eks.amazonaws.com
name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
contexts:
- context:
cluster: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
user: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
current-context: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
kind: Config
preferences: {}
users:
- name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- --region
- us-east-1
- eks
- get-token
- --cluster-name
- eks-DS7h
command: aws
After this I checked the nodes again but I still get no resource found. Than I try to edit aws-auth. Before the edit I check my user on the terminal where I triggered all terraform steps installation.
aws sts get-caller-identity
{
"UserId": "ASDFGSDFGDGSDGDFHSFDSDC",
"Account": "545153234644",
"Arn": "arn:aws:iam::545153234644:user/white"
}
I took my user info and I added blank mapuser area in aws-auth. But still getting No resources found.
kubectl get cm -n kube-system aws-auth
apiVersion: v1
data:
mapAccounts: |
[]
mapRoles: |
- "groups":
- "system:bootstrappers"
- "system:nodes"
- "system:masters"
"rolearn": "arn:aws:iam::545153234644:role/eks-DS7h22060508195731770000000e"
"username": "system:node:{{EC2PrivateDNSName}}"
mapUsers: "- \"userarn\": \"arn:aws:iam::545153234644:user/white\"\n \"username\":
\"white\"\n \"groups\":\n - \"system:masters\"\n - \"system:nodes\" \n"
kind: ConfigMap
metadata:
creationTimestamp: "2022-06-05T08:20:02Z"
labels:
app.kubernetes.io/managed-by: Terraform
terraform.io/module: terraform-aws-modules.eks.aws
name: aws-auth
namespace: kube-system
resourceVersion: "4976"
uid: b12341-33ff-4f78-af0a-758f88
Oh also when I check EKS cluster on dashboard I see below warning too. I don't know is it relevant or not. I want to share it too maybe it will help.

mapping values are not allowed in this context, AWS EKS Grafana

mkdir ${HOME}/environment/grafana
cat << EoF > ${HOME}/environment/grafana/grafana.yaml
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server.prometheus.svc.cluster.local
access: proxy
isDefault: true
EoF
I get this error when I try deploy grafana to eks cluster

Fixing DataDog agent congestion issues in Amazon EKS cluster

A few months ago I integrated DataDog into my Kubernetes cluster by using a DaemonSet configuration. Since then I've been getting congestion alerts with the following message:
Please tune the hot-shots settings
https://github.com/brightcove/hot-shots#errors
By attempting to follow the docs with my limited Orchestration/DevOps knowledge, what I could gather is that I need to add the following to my DaemonSet config:
spec
.
.
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen
value: "1024"
- name: net.core.wmem_max
value: "4194304"
I attempted to add that configuration piece to one of the auto-deployed DataDog pods directly just to try it out but it hangs indefinitely and doesn't save the configuration (Instead of adding to DaemonSet and risking bringing all agents down).
That hot-shots documentation also mentions that the above sysctl configuration requires unsafe sysctls to be enabled in the nodes that contain the pods:
kubelet --allowed-unsafe-sysctls \
'net.unix.max_dgram_qlen, net.core.wmem_max'
The cluster I am working with is fully deployed with EKS by using the Dashboard in AWS (Little knowledge on how it is configured). The above seems to be indicated for manually deployed and managed cluster.
Why is the configuration I am attempting to apply to a single DataDog agent pod not saving/applying? Is it because it is managed by DaemonSet or is it because it doesn't have the proper unsafe sysctl allowed? Something else?
If I do need to enable the suggested unsafe sysctlon all nodes of my cluster. How do I go about it since the cluster is fully deployed and managed by Amazon EKS?
So we managed to achieve this using a custom launch template with our managed node group and then passing in a custom bootstrap script. This does mean however you need to supply the AMI id yourself and lose the alerts in the console when it is outdated. In Terraform this would look like:
resource "aws_eks_node_group" "group" {
...
launch_template {
id = aws_launch_template.nodes.id
version = aws_launch_template.nodes.latest_version
}
...
}
data "template_file" "bootstrap" {
template = file("${path.module}/files/bootstrap.tpl")
vars = {
cluster_name = aws_eks_cluster.cluster.name
cluster_auth_base64 = aws_eks_cluster.cluster.certificate_authority.0.data
endpoint = aws_eks_cluster.cluster.endpoint
}
}
data "aws_ami" "eks_node" {
owners = ["602401143452"]
most_recent = true
filter {
name = "name"
values = ["amazon-eks-node-1.21-v20211008"]
}
}
resource "aws_launch_template" "nodes" {
...
image_id = data.aws_ami.eks_node.id
user_data = base64encode(data.template_file.bootstrap.rendered)
...
}
Then the bootstrap.hcl file looks like this:
#!/bin/bash
set -o xtrace
systemctl stop kubelet
/etc/eks/bootstrap.sh '${cluster_name}' \
--b64-cluster-ca '${cluster_auth_base64}' \
--apiserver-endpoint '${endpoint}' \
--kubelet-extra-args '"--allowed-unsafe-sysctls=net.unix.max_dgram_qlen"'
The next step is to set up the PodSecurityPolicy, ClusterRole and RoleBinding in your cluster so you can use the securityContext as you described above and then pods in that namespace will be able to run without a SysctlForbidden message.
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: sysctl
spec:
allowPrivilegeEscalation: false
allowedUnsafeSysctls:
- net.unix.max_dgram_qlen
defaultAllowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: allow-sysctl
rules:
- apiGroups:
- policy
resourceNames:
- sysctl
resources:
- podsecuritypolicies
verbs:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: allow-sysctl
namespace: app-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: allow-sysctl
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts:app-namespace
If using the DataDog Helm chart you can set the following values to update the securityContext of the agent. But you will have to update the chart PSP manually to set allowedUnsafeSysctls
datadog:
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen"
value: 512"

Not authorized to perform sts:AssumeRoleWithWebIdentity- 403

I have been trying to run an external-dns pod using the guide provided by k8s-sig group. I have followed every step of the guide, and getting the below error.
time="2021-02-27T13:27:20Z" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 87a3ca86-ceb0-47be-8f90-25d0c2de9f48"
I had created AWS IAM policy using Terraform, and it was successfully created. Except IAM Role for service account for which I had used eksctl, everything else has been spun via Terraform.
But then I got hold of this article which says creating AWS IAM policy using awscli would eliminate this error. So I deleted the policy created using Terraform, and recreated it with awscli. Yet, it is throwing the same error error.
Below is my external dns yaml file.
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-dns
# If you're using Amazon EKS with IAM Roles for Service Accounts, specify the following annotation.
# Otherwise, you may safely omit it.
annotations:
# Substitute your account ID and IAM service role name below.
eks.amazonaws.com/role-arn: arn:aws:iam::268xxxxxxx:role/eksctl-ats-Eks1-addon-iamserviceaccoun-Role1-WMLL93xxxx
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: external-dns
rules:
- apiGroups: [""]
resources: ["services","endpoints","pods"]
verbs: ["get","watch","list"]
- apiGroups: ["extensions","networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get","watch","list"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: external-dns-viewer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: external-dns
subjects:
- kind: ServiceAccount
name: external-dns
namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: external-dns
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: external-dns
template:
metadata:
labels:
app: external-dns
spec:
serviceAccountName: external-dns
containers:
- name: external-dns
image: k8s.gcr.io/external-dns/external-dns:v0.7.6
args:
- --source=service
- --source=ingress
- --domain-filter=xyz.com # will make ExternalDNS see only the hosted zones matching provided domain, omit to process all available hosted zones
- --provider=aws
- --policy=upsert-only # would prevent ExternalDNS from deleting any records, omit to enable full synchronization
- --aws-zone-type=public # only look at public hosted zones (valid values are public, private or no value for both)
- --registry=txt
- --txt-owner-id=Z0471542U7WSPZxxxx
securityContext:
fsGroup: 65534 # For ExternalDNS to be able to read Kubernetes and AWS token files
I am scratching my head as there is no proper solution to this error anywhere in the net. Hoping to find a solution to this issue in this forum.
End result must show something like below and fill up records in hosted zone.
time="2020-05-05T02:57:31Z" level=info msg="All records are already up to date"
I also struggled with this error.
The problem was in the definition of the trust relationship.
You can see in some offical aws tutorials (like this) the following setup:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"${OIDC_PROVIDER}:sub": "system:serviceaccount:<my-namespace>:<my-service-account>"
}
}
}
]
}
Option 1 for failure
My problem was that I passed the a wrong value for my-service-account at the end of ${OIDC_PROVIDER}:sub in the Condition part.
Option 2 for failure
After the previous fix - I still faced the same error - it was solved by following this aws tutorial which shows the output of using the eksctl with the command below:
eksctl create iamserviceaccount \
--name my-serviceaccount \
--namespace <your-ns> \
--cluster <your-cluster-name> \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
--approve
When you look at the output in the trust relationship tab in the AWS web console - you can see that an additional condition was added with the postfix of :aud and the value of sts.amazonaws.com:
So this need to be added after the "${OIDC_PROVIDER}:sub" condition.
I was able to get help from the Kubernetes Slack (shout out to #Rob Del) and this is what we came up with. There's nothing wrong with the k8s rbac from the article, the issue is the way the IAM role is written. I am using Terraform v0.12.24, but I believe something similar to the following .tf should work for Terraform v0.14:
data "aws_caller_identity" "current" {}
resource "aws_iam_role" "external_dns_role" {
name = "external-dns"
assume_role_policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": format(
"arn:aws:iam::${data.aws_caller_identity.current.account_id}:%s",
replace(
"${aws_eks_cluster.<YOUR_CLUSTER_NAME>.identity[0].oidc[0].issuer}",
"https://",
"oidc-provider/"
)
)
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
format(
"%s:sub",
trimprefix(
"${aws_eks_cluster.<YOUR_CLUSTER_NAME>.identity[0].oidc[0].issuer}",
"https://"
)
) : "system:serviceaccount:default:external-dns"
}
}
}
]
})
}
The above .tf assume you created your eks cluster using terraform and that you use the rbac manifest from the external-dns tutorial.
I have a few possibilities here.
Before anything else, does your cluster have an OIDC provider associated with it? IRSA won't work without it.
You can check that in the AWS console, or via the CLI with:
aws eks describe-cluster --name {name} --query "cluster.identity.oidc.issuer"
First
Delete the iamserviceaccount, recreate it, remove the ServiceAccount definition from your ExternalDNS manfiest (the entire first section) and re-apply it.
eksctl delete iamserviceaccount --name {name} --namespace {namespace} --cluster {cluster}
eksctl create iamserviceaccount --name {name} --namespace {namespace} --cluster
{cluster} --attach-policy-arn {policy-arn} --approve --override-existing-serviceaccounts
kubectl apply -n {namespace} -f {your-externaldns-manifest.yaml}
It may be that there is some conflict going on as you have overwritten what you created with eksctl createiamserviceaccount by also specifying a ServiceAccount in your ExternalDNS manfiest.
Second
Upgrade your cluster to v1.19 (if it's not there already):
eksctl upgrade cluster --name {name} will show you what will be done;
eksctl upgrade cluster --name {name} --approve will do it
Third
Some documentation suggests that in addition to setting securityContext.fsGroup: 65534, you also need to set securityContext.runAsUser: 0.
I've been struggling with a similar issue after following the setup suggested here
I ended up with the exception below in the deploy logs.
time="2021-05-10T06:40:17Z" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 3fda6c69-2a0a-4bc9-b478-521b5131af9b"
time="2021-05-10T06:41:20Z" level=error msg="records retrieval failed: failed to list hosted zones: WebIdentityErr: failed to retrieve credentials\ncaused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n\tstatus code: 403, request id: 7d3e07a2-c514-44fa-8e79-d49314d9adb6"
In my case, it was an issue with wrong Service account name mapped to the new role created.
Here is a step by step approach to get this done without much hiccups.
Create the IAM Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"route53:ChangeResourceRecordSets"
],
"Resource": [
"arn:aws:route53:::hostedzone/*"
]
},
{
"Effect": "Allow",
"Action": [
"route53:ListHostedZones",
"route53:ListResourceRecordSets"
],
"Resource": [
"*"
]
}
]
}
Create the IAM role and the service account for your EKS cluster.
eksctl create iamserviceaccount \
--name external-dns-sa-eks \
--namespace default \
--cluster aecops-grpc-test \
--attach-policy-arn arn:aws:iam::xxxxxxxx:policy/external-dns-policy-eks \
--approve
--override-existing-serviceaccounts
Created new hosted zone.
aws route53 create-hosted-zone --name "hosted.domain.com." --caller-reference "grpc-endpoint-external-dns-test-$(date +%s)"
Deploy ExternalDNS, after creating the Cluster role and Cluster role binding to the previously created service account.
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: external-dns
rules:
- apiGroups: [""]
resources: ["services","endpoints","pods"]
verbs: ["get","watch","list"]
- apiGroups: ["extensions","networking.k8s.io"]
resources: ["ingresses"]
verbs: ["get","watch","list"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: external-dns-viewer
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: external-dns
subjects:
- kind: ServiceAccount
name: external-dns-sa-eks
namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: external-dns
spec:
strategy:
type: Recreate
selector:
matchLabels:
app: external-dns
template:
metadata:
labels:
app: external-dns
# If you're using kiam or kube2iam, specify the following annotation.
# Otherwise, you may safely omit it.
annotations:
iam.amazonaws.com/role: arn:aws:iam::***********:role/eksctl-eks-cluster-name-addon-iamserviceacco-Role1-156KP94SN7D7
spec:
serviceAccountName: external-dns-sa-eks
containers:
- name: external-dns
image: k8s.gcr.io/external-dns/external-dns:v0.7.6
args:
- --source=service
- --source=ingress
- --domain-filter=hosted.domain.com. # will make ExternalDNS see only the hosted zones matching provided domain, omit to process all available hosted zones
- --provider=aws
- --policy=upsert-only # would prevent ExternalDNS from deleting any records, omit to enable full synchronization
- --aws-zone-type=public # only look at public hosted zones (valid values are public, private or no value for both)
- --registry=txt
- --txt-owner-id=my-hostedzone-identifier
securityContext:
fsGroup: 65534 # For ExternalDNS to be able to read Kubernetes and AWS token files
Update Ingress resource with the domain name and reapply the manifest.
For ingress objects, ExternalDNS will create a DNS record based on the host specified for the ingress object.
- host: myapp.hosted.domain.com
Validate new records created.
BASH-3.2$ aws route53 list-resource-record-sets --output json
--hosted-zone-id "/hostedzone/Z065*********" --query "ResourceRecordSets[?Name == 'hosted.domain.com..']|[?Type == 'A']"
[
{
"Name": "myapp.hosted.domain.com..",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "ZCT6F*******",
"DNSName": "****************.elb.ap-southeast-2.amazonaws.com.",
"EvaluateTargetHealth": true
}
} ]
In our case this issue occurred when using the Terraform module to create the eks cluster, and eksctl to create the iamserviceaccount for the aws-load-balancer controller. It all works fine the first go-round. But if you do a terraform destroy, you need to do some cleanup, like delete the CloudFormation script created by eksctl. Somehow things got crossed, and the CloudTrail was passing along a resource role that was no longer valid. So check the annotation of the service account to ensure it's valid, and update it if necessary. Then in my case I deleted and redeployed the aws-load-balancer-controller
%> kubectl describe serviceaccount aws-load-balancer-controller -n kube-system
Name: aws-load-balancer-controller
Namespace: kube-system
Labels: app.kubernetes.io/managed-by=eksctl
Annotations: eks.amazonaws.com/role-arn: arn:aws:iam::212222224610:role/eksctl-ch-test-addon-iamserviceaccou-Role1-JQL4R3JM7I1A
Image pull secrets: <none>
Mountable secrets: aws-load-balancer-controller-token-b8hw7
Tokens: aws-load-balancer-controller-token-b8hw7
Events: <none>
%>
%> kubectl annotate --overwrite serviceaccount aws-load-balancer-controller eks.amazonaws.com/role-arn='arn:aws:iam::212222224610:role/eksctl-ch-test-addon-iamserviceaccou-Role1-17A92GGXZRY6O' -n kube-system
In my case, I was able to attach the oidc role with route53 permissions policy and that resolved the error.
https://medium.com/swlh/amazon-eks-setup-external-dns-with-oidc-provider-and-kube2iam-f2487c77b2a1
and then with the external-dns service account used that instead of the cluster role.
annotations:
# # Substitute your account ID and IAM service role name below.
eks.amazonaws.com/role-arn: arn:aws:iam::<account>:role/external-dns-service-account-oidc-role
For me the issue was that the trust relationship was (correctly) setup using one partition whereas the ServiceAccount was annotated with a different partition, like so:
...
"Principal": {
"Federated": "arn:aws-us-gov:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
},
...
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::{{ .Values.aws.account }}:role/{{ .Values.aws.roleName }}
Notice arn:aws:iam vs arn:aws-us-gov:iam

Authenticate an AWS SQS scaler in Keda

I have a Keda deployment that I've been trying to get to work for about a month now. At the moment, my scaler looks like this:
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: {service-name}-scaler
spec:
scaleTargetRef:
deploymentName: {service-name}
containerName: {service-name}
pollingInterval: 30
cooldownPeriod: 600
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: aws-sqs-queue
authenticationRef:
name: keda-trigger-authentication
metadata:
queueURL: https://sqs.ap-northeast-1.amazonaws.com/{AWS ID}/{Queue-name}
queueLength: "1"
awsRegion: "ap-northeast-1"
identityOwner: pod
The associated trigger authentication and secret are:
apiVersion: v1
kind: Secret
metadata:
name: keda-secrets
data:
AWS_ACCESS_KEY_ID: {base64-encoded-string}
AWS_SECRET_ACCESS_KEY: {base64-encoded-string}
KEDA_ROLE_ARN: {base64-encoded-string}
---
apiVersion: keda.k8s.io/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-authentication
spec:
env:
- parameter: awsRegion
name: AWS_REGION
- parameter: awsAccessKeyID
name: AWS_ACCESS_KEY_ID
- parameter: awsSecretAccessKey
name: AWS_SECRET_ACCESS_KEY
- parameter: awsRoleArn
name: KEDA_ROLE_ARN
secretTargetRef:
- parameter: awsRoleArn
name: keda-secrets
key: KEDA_ROLE_ARN
I understand that the KEDA_ROLE_ARN value is repeated here; I left both for debugging purposes. The order of deploying this is as follows:
Install common environment variables (this is where the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and KEDA_ROLE_ARN values are stored. The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY values are listed as AWS_ACCESS_KEY_ID_ASSUME and AWS_SECRET_ACCESS_KEY_ASSUME respectively in the file and will assume their appropriate values on the container. Again, these are duplicated for debugging purposes. I would prefer to use these values rather than a separate secret.
Install Keda pods with Helm
Deploy the keda-secrets secret and the keda-trigger-authentication trigger authentication
Deploy the container that should be scaled. This is where the AWS_ACCESS_KEY_ID_ASSUME value will assume the name of AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY_ASSUME value will assume the name of AWS_SECRET_ACCESS_KEY and where the AWS_REGION value is defined.
The scaled object is deployed
For some reason, I keep getting an error from AWS when the scaler attempts to scale saying that there are no credential providers in the chain. It appears that the AWS credentials are not being sent. What am I doing wrong here?
I will show you two ways to successfully scale deployment based on AWS SQS
First way : Using AWS IAM role attached to node
If your IAM role (node role) has permission to SQS then accessing SQS becomes easier you just have to change identityOwner: pod field to identityOwner: operator so that KEDA can use node role to access AWS SQS
Sample ScaledObject file with SQS trigger
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: aws-sqs-queue-scaledobject
namespace: default
spec:
scaleTargetRef:
name: test-deployment
minReplicaCount: 0
maxReplicaCount: 2
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/3243234432432/Queue
queueLength: "5"
awsRegion: "us-east-1"
identityOwner: operator
Second way: Using IAM user
In this approach, we need to create below objects
Create IAM user in AWS.
Create secret in Kubernetes.
Create TriggerAuthentication in Kubernetes.
Create scaledObject in Kubernetes.
Create IAM user and give SQS permissions to this IAM user.
first encode IAM user Access Key and Secret key using base64 which will be required while creating Kubernetes secret.
Create secret
apiVersion: v1
kind: Secret
metadata:
name: test-secrets
namespace: default
data:
AWS_ACCESS_KEY_ID: <base64-encoded-key>
AWS_SECRET_ACCESS_KEY: <base64-encoded-secret-key>
Create TriggerAuthentication this will be used in scaledObject
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-auth-aws-credentials
namespace: default
spec:
secretTargetRef:
- parameter: awsAccessKeyID # Required.
name: test-secrets # Required.
key: AWS_ACCESS_KEY_ID # Required.
- parameter: awsSecretAccessKey # Required.
name: test-secrets # Required.
key: AWS_SECRET_ACCESS_KEY # Required.
Create scaledObject to map keda with deployment you want to scale based on SQS trigger
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: aws-sqs-queue-scaledobject
namespace: default
spec:
scaleTargetRef:
name: test-deployment
minReplicaCount: 0
maxReplicaCount: 2
triggers:
- type: aws-sqs-queue
authenticationRef:
name: keda-trigger-auth-aws-credentials
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/012345678912/Queue
queueLength: "5"
awsRegion: "us-east-1"