AWS EKS secrets store provider - Failed to fetch secret from all regions: name - amazon-web-services

I am using the AWS secrets store CSI provider to sync secrets from the AWS Secret Manager into Kubernetes/EKS.
The SecretProviderClass is:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: test-provider
spec:
provider: aws
parameters:
objects: |
- objectName: mysecret
objectType: secretsmanager
jmesPath:
- path: APP_ENV
objectAlias: APP_ENV
- path: APP_DEBUG
objectAlias: APP_DEBUG
And the Pod mounting these secrets is:
apiVersion: v1
kind: Pod
metadata:
name: secret-pod
spec:
restartPolicy: Never
serviceAccountName: my-account
terminationGracePeriodSeconds: 2
containers:
- name: dotfile-test-container
image: registry.k8s.io/busybox
volumeMounts:
- name: secret-volume
readOnly: true
mountPath: "/mnt/secret-volume"
volumes:
- name: secret-volume
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: test-provider
The secret exists in the Secret Provider:
{
"APP_ENV": "staging",
"APP_DEBUG": false
}
(this is an example, I am aware I do not need to store these particular variables as secrets)
But when I create the resources, the Pod fails to run with
Warning
FailedMount
96s (x10 over 5m47s)
kubelet
MountVolume.SetUp failed for volume "secret-volume" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod pace/secret-dotfiles-pod,
err: rpc error: code = Unknown desc = Failed to fetch secret from all regions: mysecret

Turns out the error message is very misleading. The problem in my case was due to the type of the APP_DEBUG value. Changing it from a boolean to string
fixed the problem and now the pod starts correctly.
{
"APP_ENV": "staging",
"APP_DEBUG": "false"
}
Seems like a bug in the provider to me.

Related

Mounting AWS Secrets Manager on Kubernetes/Helm chart

I have created an apps cluster deployment on AWS EKS that is deployed using Helm. For proper operation of my app, I need to set env variables, which are secrets stored in AWS Secrets manager. Referencing a tutorial, I set up my values in values.yaml file someway like this
secretsData:
secretName: aws-secrets
providerName: aws
objectName: CodeBuild
Now I have created a secrets provider class as AWS recommends: secret-provider.yaml
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: aws-secret-provider-class
spec:
provider: {{ .Values.secretsData.providerName }}
parameters:
objects: |
- objectName: "{{ .Values.secretsData.objectName }}"
objectType: "secretsmanager"
jmesPath:
- path: SP1_DB_HOST
objectAlias: SP1_DB_HOST
- path: SP1_DB_USER
objectAlias: SP1_DB_USER
- path: SP1_DB_PASSWORD
objectAlias: SP1_DB_PASSWORD
- path: SP1_DB_PATH
objectAlias: SP1_DB_PATH
secretObjects:
- secretName: {{ .Values.secretsData.secretName }}
type: Opaque
data:
- objectName: SP1_DB_HOST
key: SP1_DB_HOST
- objectName: SP1_DB_USER
key: SP1_DB_USER
- objectName: SP1_DB_PASSWORD
key: SP1_DB_PASSWORD
- objectName: SP1_DB_PATH
key: SP1_DB_PATH
I mount this secret object in my deployment.yaml, the relevant section of the file looks like this:
volumeMounts:
- name: secrets-store-volume
mountPath: "/mnt/secrets"
readOnly: true
env:
- name: SP1_DB_HOST
valueFrom:
secretKeyRef:
name: {{ .Values.secretsData.secretName }}
key: SP1_DB_HOST
- name: SP1_DB_PORT
valueFrom:
secretKeyRef:
name: {{ .Values.secretsData.secretName }}
key: SP1_DB_PORT
further down in same deployment file, I define secrets-store-volume as :
volumes:
- name: secrets-store-volume
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: aws-secret-provider-class
All drivers are installed into cluster and permissions are set accordingly
with helm install mydeployment helm-folder/ --dry-run I can see all the files and values are populated as expected. Then with helm install mydeployment helm-folder/ I install the deployment into my cluster but with kubectl get all I can see the pod is stuck at Pending with warning Error: 'aws-secrets' not found and eventually gets timeout. In AWS CloudTrail log, I can see that the cluster made request to access the secret and there was no error fetching it. How can I solve this or maybe further debug it? Thank you for your time and efforts.
Error: 'aws-secrets' not found - looks like CSI Driver isn't creating kubernetes secret that you're using to reference values
Since yaml files looks correctly, I would say it's probably CSI Driver configuration Sync as Kubernetes secret - syncSecret.enabled (which is false by default)
So make sure that secrets-store-csi-driver runs with this flag set to true, for example:
helm upgrade --install csi-secrets-store \
--namespace kube-system secrets-store-csi-driver/secrets-store-csi-driver \
--set grpcSupportedProviders="aws" --set syncSecret.enabled="true"

use AWS Secrets & Configuration Provider for EKS: Error from server (BadRequest)

I'm following this AWS documentation which explains how to properly configure AWS Secrets Manager to let it works with EKS through Kubernetes Secrets.
I successfully followed step by step all the different commands as explained in the documentation.
The only difference I get is related to this step where I have to run:
kubectl get po --namespace=kube-system
The expected output should be:
csi-secrets-store-qp9r8 3/3 Running 0 4m
csi-secrets-store-zrjt2 3/3 Running 0 4m
but instead I get:
csi-secrets-store-provider-aws-lxxcz 1/1 Running 0 5d17h
csi-secrets-store-provider-aws-rhnc6 1/1 Running 0 5d17h
csi-secrets-store-secrets-store-csi-driver-ml6jf 3/3 Running 0 5d18h
csi-secrets-store-secrets-store-csi-driver-r5cbk 3/3 Running 0 5d18h
As you can see the names are different, but I'm quite sure it's ok :-)
The real problem starts here in step 4: I created the following YAML file (as you ca see I added some parameters):
apiVersion: secrets-store.csi.x-k8s.io/v1alpha1
kind: SecretProviderClass
metadata:
name: aws-secrets
spec:
provider: aws
parameters:
objects: |
- objectName: "mysecret"
objectType: "secretsmanager"
And finally I created a deploy (as explain here in step 5) using the following yaml file:
# test-deployment.yaml
kind: Pod
apiVersion: v1
metadata:
name: nginx-secrets-store-inline
spec:
serviceAccountName: iamserviceaccountforkeyvaultsecretmanagerresearch
containers:
- image: nginx
name: nginx
volumeMounts:
- name: mysecret-volume
mountPath: "/mnt/secrets-store"
readOnly: true
volumes:
- name: mysecret-volume
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "aws-secrets"
After the deployment through the command:
kubectl apply -f test-deployment.yaml -n mynamespace
The pod is not able to start properly because the following error is generated:
Error from server (BadRequest): container "nginx" in pod "nginx-secrets-store-inline" is waiting to start: ContainerCreating
But, for example, if I run the deployment with the following yaml the POD will be successfully created
# test-deployment.yaml
kind: Pod
apiVersion: v1
metadata:
name: nginx-secrets-store-inline
spec:
serviceAccountName: iamserviceaccountforkeyvaultsecretmanagerresearch
containers:
- image: nginx
name: nginx
volumeMounts:
- name: keyvault-credential-volume
mountPath: "/mnt/secrets-store"
readOnly: true
volumes:
- name: keyvault-credential-volume
emptyDir: {} # <<== !! LOOK HERE !!
as you can see I used
emptyDir: {}
So as far I can see the problem here is related to the following YAML lines:
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "aws-secrets"
To be honest it's even not clear in my mind what's happing here.
Probably I didn't properly enabled the volume permission in EKS?
Sorry but I'm a newbie in both AWS and Kubernetes configurations.
Thanks for you time
--- NEW INFO ---
If I run
kubectl describe pod nginx-secrets-store-inline -n mynamespace
where nginx-secrets-store-inline is the name of the pod, I get the following output:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30s default-scheduler Successfully assigned mynamespace/nginx-secrets-store-inline to ip-10-0-24-252.eu-central-1.compute.internal
Warning FailedMount 14s (x6 over 29s) kubelet MountVolume.SetUp failed for volume "keyvault-credential-volume" : rpc error: code = Unknown desc = failed to get secretproviderclass mynamespace/aws-secrets, error: SecretProviderClass.secrets-store.csi.x-k8s.io "aws-secrets" not found
Any hints?
Finally I realized why it wasn't working. As explained here, the error:
Warning FailedMount 3s (x4 over 6s) kubelet, kind-control-plane MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to get secretproviderclass default/azure, error: secretproviderclasses.secrets-store.csi.x-k8s.io "azure" not found
is related to namespace:
The SecretProviderClass being referenced in the volumeMount needs to exist in the same namespace as the application pod.
So both the yaml file should be deployed in the same namespace (adding, for example, the -n mynamespace argument).
Finally I got it working!

Fixing DataDog agent congestion issues in Amazon EKS cluster

A few months ago I integrated DataDog into my Kubernetes cluster by using a DaemonSet configuration. Since then I've been getting congestion alerts with the following message:
Please tune the hot-shots settings
https://github.com/brightcove/hot-shots#errors
By attempting to follow the docs with my limited Orchestration/DevOps knowledge, what I could gather is that I need to add the following to my DaemonSet config:
spec
.
.
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen
value: "1024"
- name: net.core.wmem_max
value: "4194304"
I attempted to add that configuration piece to one of the auto-deployed DataDog pods directly just to try it out but it hangs indefinitely and doesn't save the configuration (Instead of adding to DaemonSet and risking bringing all agents down).
That hot-shots documentation also mentions that the above sysctl configuration requires unsafe sysctls to be enabled in the nodes that contain the pods:
kubelet --allowed-unsafe-sysctls \
'net.unix.max_dgram_qlen, net.core.wmem_max'
The cluster I am working with is fully deployed with EKS by using the Dashboard in AWS (Little knowledge on how it is configured). The above seems to be indicated for manually deployed and managed cluster.
Why is the configuration I am attempting to apply to a single DataDog agent pod not saving/applying? Is it because it is managed by DaemonSet or is it because it doesn't have the proper unsafe sysctl allowed? Something else?
If I do need to enable the suggested unsafe sysctlon all nodes of my cluster. How do I go about it since the cluster is fully deployed and managed by Amazon EKS?
So we managed to achieve this using a custom launch template with our managed node group and then passing in a custom bootstrap script. This does mean however you need to supply the AMI id yourself and lose the alerts in the console when it is outdated. In Terraform this would look like:
resource "aws_eks_node_group" "group" {
...
launch_template {
id = aws_launch_template.nodes.id
version = aws_launch_template.nodes.latest_version
}
...
}
data "template_file" "bootstrap" {
template = file("${path.module}/files/bootstrap.tpl")
vars = {
cluster_name = aws_eks_cluster.cluster.name
cluster_auth_base64 = aws_eks_cluster.cluster.certificate_authority.0.data
endpoint = aws_eks_cluster.cluster.endpoint
}
}
data "aws_ami" "eks_node" {
owners = ["602401143452"]
most_recent = true
filter {
name = "name"
values = ["amazon-eks-node-1.21-v20211008"]
}
}
resource "aws_launch_template" "nodes" {
...
image_id = data.aws_ami.eks_node.id
user_data = base64encode(data.template_file.bootstrap.rendered)
...
}
Then the bootstrap.hcl file looks like this:
#!/bin/bash
set -o xtrace
systemctl stop kubelet
/etc/eks/bootstrap.sh '${cluster_name}' \
--b64-cluster-ca '${cluster_auth_base64}' \
--apiserver-endpoint '${endpoint}' \
--kubelet-extra-args '"--allowed-unsafe-sysctls=net.unix.max_dgram_qlen"'
The next step is to set up the PodSecurityPolicy, ClusterRole and RoleBinding in your cluster so you can use the securityContext as you described above and then pods in that namespace will be able to run without a SysctlForbidden message.
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: sysctl
spec:
allowPrivilegeEscalation: false
allowedUnsafeSysctls:
- net.unix.max_dgram_qlen
defaultAllowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: allow-sysctl
rules:
- apiGroups:
- policy
resourceNames:
- sysctl
resources:
- podsecuritypolicies
verbs:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: allow-sysctl
namespace: app-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: allow-sysctl
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts:app-namespace
If using the DataDog Helm chart you can set the following values to update the securityContext of the agent. But you will have to update the chart PSP manually to set allowedUnsafeSysctls
datadog:
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen"
value: 512"

Authenticate an AWS SQS scaler in Keda

I have a Keda deployment that I've been trying to get to work for about a month now. At the moment, my scaler looks like this:
apiVersion: keda.k8s.io/v1alpha1
kind: ScaledObject
metadata:
name: {service-name}-scaler
spec:
scaleTargetRef:
deploymentName: {service-name}
containerName: {service-name}
pollingInterval: 30
cooldownPeriod: 600
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: aws-sqs-queue
authenticationRef:
name: keda-trigger-authentication
metadata:
queueURL: https://sqs.ap-northeast-1.amazonaws.com/{AWS ID}/{Queue-name}
queueLength: "1"
awsRegion: "ap-northeast-1"
identityOwner: pod
The associated trigger authentication and secret are:
apiVersion: v1
kind: Secret
metadata:
name: keda-secrets
data:
AWS_ACCESS_KEY_ID: {base64-encoded-string}
AWS_SECRET_ACCESS_KEY: {base64-encoded-string}
KEDA_ROLE_ARN: {base64-encoded-string}
---
apiVersion: keda.k8s.io/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-authentication
spec:
env:
- parameter: awsRegion
name: AWS_REGION
- parameter: awsAccessKeyID
name: AWS_ACCESS_KEY_ID
- parameter: awsSecretAccessKey
name: AWS_SECRET_ACCESS_KEY
- parameter: awsRoleArn
name: KEDA_ROLE_ARN
secretTargetRef:
- parameter: awsRoleArn
name: keda-secrets
key: KEDA_ROLE_ARN
I understand that the KEDA_ROLE_ARN value is repeated here; I left both for debugging purposes. The order of deploying this is as follows:
Install common environment variables (this is where the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and KEDA_ROLE_ARN values are stored. The AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY values are listed as AWS_ACCESS_KEY_ID_ASSUME and AWS_SECRET_ACCESS_KEY_ASSUME respectively in the file and will assume their appropriate values on the container. Again, these are duplicated for debugging purposes. I would prefer to use these values rather than a separate secret.
Install Keda pods with Helm
Deploy the keda-secrets secret and the keda-trigger-authentication trigger authentication
Deploy the container that should be scaled. This is where the AWS_ACCESS_KEY_ID_ASSUME value will assume the name of AWS_ACCESS_KEY_ID and the AWS_SECRET_ACCESS_KEY_ASSUME value will assume the name of AWS_SECRET_ACCESS_KEY and where the AWS_REGION value is defined.
The scaled object is deployed
For some reason, I keep getting an error from AWS when the scaler attempts to scale saying that there are no credential providers in the chain. It appears that the AWS credentials are not being sent. What am I doing wrong here?
I will show you two ways to successfully scale deployment based on AWS SQS
First way : Using AWS IAM role attached to node
If your IAM role (node role) has permission to SQS then accessing SQS becomes easier you just have to change identityOwner: pod field to identityOwner: operator so that KEDA can use node role to access AWS SQS
Sample ScaledObject file with SQS trigger
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: aws-sqs-queue-scaledobject
namespace: default
spec:
scaleTargetRef:
name: test-deployment
minReplicaCount: 0
maxReplicaCount: 2
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/3243234432432/Queue
queueLength: "5"
awsRegion: "us-east-1"
identityOwner: operator
Second way: Using IAM user
In this approach, we need to create below objects
Create IAM user in AWS.
Create secret in Kubernetes.
Create TriggerAuthentication in Kubernetes.
Create scaledObject in Kubernetes.
Create IAM user and give SQS permissions to this IAM user.
first encode IAM user Access Key and Secret key using base64 which will be required while creating Kubernetes secret.
Create secret
apiVersion: v1
kind: Secret
metadata:
name: test-secrets
namespace: default
data:
AWS_ACCESS_KEY_ID: <base64-encoded-key>
AWS_SECRET_ACCESS_KEY: <base64-encoded-secret-key>
Create TriggerAuthentication this will be used in scaledObject
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-trigger-auth-aws-credentials
namespace: default
spec:
secretTargetRef:
- parameter: awsAccessKeyID # Required.
name: test-secrets # Required.
key: AWS_ACCESS_KEY_ID # Required.
- parameter: awsSecretAccessKey # Required.
name: test-secrets # Required.
key: AWS_SECRET_ACCESS_KEY # Required.
Create scaledObject to map keda with deployment you want to scale based on SQS trigger
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: aws-sqs-queue-scaledobject
namespace: default
spec:
scaleTargetRef:
name: test-deployment
minReplicaCount: 0
maxReplicaCount: 2
triggers:
- type: aws-sqs-queue
authenticationRef:
name: keda-trigger-auth-aws-credentials
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/012345678912/Queue
queueLength: "5"
awsRegion: "us-east-1"

How to get cluster subdomain in kubernetes deployment config template

On kubernetes 1.6.1 (Openshift 3.6 CP) I'm trying to get the subdomain of my cluster using $(OPENSHIFT_MASTER_DEFAULT_SUBDOMAIN) but it's not dereferencing at runtime. Not sure what I'm doing wrong, docs show this is how environment parameters should be acquired.
https://v1-6.docs.kubernetes.io/docs/api-reference/v1.6/#container-v1-core
- apiVersion: v1
kind: DeploymentConfig
spec:
template:
metadata:
labels:
deploymentconfig: ${APP_NAME}
name: ${APP_NAME}
spec:
containers:
- name: myapp
env:
- name: CLOUD_CLUSTER_SUBDOMAIN
value: $(OPENSHIFT_MASTER_DEFAULT_SUBDOMAIN)
You'll need to set that value as an environment variable, this is the usage:
oc set env <object-selection> KEY_1=VAL_1
for example if your pod is named foo and your subdomain is foo.bar, you would use this command:
oc set env dc/foo OPENSHIFT_MASTER_DEFAULT_SUBDOMAIN=foo.bar