Cloudwatch agent with role instead access and secret key - amazon-web-services

I got this deployment from the internet and tested it with great result. My question is, are there config parameters that i can use to pass a role ARN instead of access key and secret key? I tried passing a role ARN in various forms inside aws-credentials. But it was to no avail.
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: cwagent-prometheus
namespace: amazon-cloudwatch
spec:
replicas: 1
selector:
matchLabels:
app: cwagent-prometheus
template:
metadata:
labels:
app: cwagent-prometheus
spec:
containers:
- name: cloudwatch-agent
image: amazon/cloudwatch-agent:1.247348.0b251302
imagePullPolicy: Always
env:
- name: CI_VERSION
value: "k8s/1.3.8"
volumeMounts:
- name: prometheus-cwagentconfig
mountPath: /etc/cwagentconfig
- name: prometheus-config
mountPath: /etc/prometheusconfig
- name: aws-credentials
mountPath: /root/.aws
volumes:
- name: prometheus-cwagentconfig
configMap:
name: prometheus-cwagentconfig
- name: prometheus-config
configMap:
name: prometheus-config
- name: aws-credentials
secret:
secretName: aws-credentials
serviceAccountName: cwagent-prometheus
The typical working solution is to provide aws-credentials with the format:
[AmazonCloudWatchAgent]
aws_access_key_id = $AWS_ID
aws_secret_access_key = $AWS_KEY
For instance, I tried changing it to:
[AmazonCloudWatchAgent]
role_arn = $ROLE_ARN
With this solution, the cloudwatch agent will complain about not finding aws_access_key_id in the credentials.

This is a know issue and still not resolved yet.
Use IAM Roles for Service Accounts issue on amazon-cloudwatch-agent
aws-helm-eks-charts issue

Related

Google Kubernetes Engine & Github actions deploy deployments.apps "gke-deployment" not found

I've been trying to run Google Kubernetes Engine deploy action for my github repo.
I have made a github workflow job run and everything works just fine except the deploy step.
Here is my error code:
Error from server (NotFound): deployments.apps "gke-deployment" not found
I'm assuming my yaml files are at fault, I'm fairly new to this so I got these from the internet and just edited a bit to fit my code, but I don't know the details.
Kustomize.yaml:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
metadata:
name: arbitrary
# Example configuration for the webserver
# at https://github.com/monopole/hello
commonLabels:
app: videoo-render
resources:
- deployment.yaml
- service.yaml
deployment.yaml (I think the error is here):
apiVersion: apps/v1
kind: Deployment
metadata:
name: the-deployment
spec:
replicas: 3
selector:
matchLabels:
deployment: video-render
template:
metadata:
labels:
deployment: video-render
spec:
containers:
- name: the-container
image: monopole/hello:1
command: ["/video-render",
"--port=8080",
"--enableRiskyFeature=$(ENABLE_RISKY)"]
ports:
- containerPort: 8080
env:
- name: ALT_GREETING
valueFrom:
configMapKeyRef:
name: the-map
key: altGreeting
- name: ENABLE_RISKY
valueFrom:
configMapKeyRef:
name: the-map
key: enableRisky
service.yaml:
kind: Service
apiVersion: v1
metadata:
name: the-service
spec:
selector:
deployment: video-render
type: LoadBalancer
ports:
- protocol: TCP
port: 8666
targetPort: 8080
Using ubuntu 20.04 image, repo is C++ code.
For anyone wondering why this happens:
You have to edit this line to an existing deployment:
DEPLOYMENT_NAME: gke-deployment # TODO: update to deployment name,
to:
DEPLOYMENT_NAME: existing-deployment-name

Unable to connect to AWS services form EKS

I have created an EKS cluster using eksctl. I am following these steps to establish connectivity to AWS services like S3, cloudwatch using spring-boot.
Create EKS using eksctl - This has my service account details and OIDC enabled.
List the service accounts to see if they were created fine
Create a deployment using the account name
Create a service
I am seeing a 403 in the logs:
User: arn:aws:sts:xxx/xxxx is not authorized to perform:
cloudformation:DescribeStackResources because no identity-based policy allows
the cloudformation:DescribeStackResources action (Service: AmazonCloudFormation; Status Code: 403;
Error Code: AccessDenied; Request ID: xxxx)
Can I get some help here to troubleshoot this issue, please?
What I have figured out after posting this issue is my node which is provisioned by eksctl, has been applied with rules. This is the rule which my app is picking up due to the default CredentialChain.
What I haven't still figured out is how do I enable the apps in the pod to assume a service account role.
YAML for #1
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: name
region: ap-south-1
availabilityZones: ["xxxx", "xxxx", "xxxx"]
managedNodeGroups:
- name: c5large-nodes
desiredCapacity: 1
instanceType: c5.large
labels:
node-type: large
volumeSize: 5
cloudWatch:
clusterLogging:
enableTypes: [ "*" ]
iam:
withOIDC: true
serviceAccounts:
- metadata:
name: cluster-autoscaler
namespace: kube-system
labels: {aws-usage: "autoscaling"}
wellKnownPolicies:
autoScaler: true
roleName: eksctl-cluster-autoscaler-role
roleOnly: true
- metadata:
name: backend-stage-iam-role
namespace: backend-stage
labels: { aws-usage: "all-backend-allow" }
attachPolicyARNs:
- "arn:aws:iam::xxxx"
- metadata:
name: mq-access
namespace: backend-stage
labels: { aws-usage: "MQ" }
attachPolicyARNs:
- "arn:aws:iam::aws:policy/AmazonMQFullAccess"
YAML for deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
namespace: backend-stage
spec:
replicas: 1
selector:
matchLabels:
app: my-app
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: my-app
spec:
serviceAccountName: backend-stage-iam-role
containers:
- image: xxx/my-app:latest
imagePullPolicy: Always
name: my-app
ports:
- containerPort: 8080
protocol: TCP
env:
- name: SPRING_PROFILES_ACTIVE
value: stage
YAML for service
apiVersion: v1
kind: Service
metadata:
name: my-app
namespace: backend-stage
spec:
selector:
app: my-app
type: LoadBalancer
ports:
- protocol: TCP
port: 80
targetPort: 8080
The role is defined like this for now:
- Effect: Allow
Action:
- cloudformation:*
Resource: "*"
I did further debugging, by describing the pod, I can see the role passed as an ENV parameter:
AWS_ROLE_ARN: arn:aws:iam::MYACCOUNT:role/MyRole```
Just add missing permission to arn:aws:sts:xxx/xxxx assumed role.

RabbitMQ only shows one node

I have been trying to set up RabbitMQ on a k8s cluster, I finally got everything set up, but only one node shows up on the managementUI. Here are my steps:
1. Dockerfile Setup
I do this to enable autocluster:
FROM rabbitmq:3.8-rc-management-alpine
MAINTAINER kevlai
RUN rabbitmq-plugins --offline enable rabbitmq_peer_discovery_k8s
2. Set up RBAC
apiVersion: v1
kind: ServiceAccount
metadata:
name: borecast-rabbitmq
namespace: borecast-production
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: borecast-rabbitmq
namespace: borecast-production
rules:
- apiGroups:
- ""
resources:
- endpoints
verbs:
- get
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: borecast-rabbitmq
namespace: borecast-production
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: dev
subjects:
- kind: ServiceAccount
name: borecast-rabbitmq
namespace: borecast-production
3. Set up Secrets
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-secret
namespace: borecast-production
type: Opaque
data:
username: a2V2
password: Ym9yZWNhc3RydWx6
secretCookie: c2VjcmV0Y29va2llaGVyZQ==
4. Set up StorageClass
I'm setting up StorageClass so k8s will automatically do provision for me on AWS.
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: rabbitmq-sc
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
zone: us-east-2a
reclaimPolicy: Retain
5. Set up StatefulSets and Services
You can see there are two services. The headless service is for the pods themselves. As for the management service, I'll expose the service for an Ingress controller in order for it to be accessible from outside.
---
apiVersion: v1
kind: Service
metadata:
name: borecast-rabbitmq-management-service
namespace: borecast-production
labels:
app: borecast-rabbitmq
spec:
ports:
- port: 15672
targetPort: 15672
name: http
- port: 5672
targetPort: 5672
name: amqp
selector:
app: borecast-rabbitmq
---
apiVersion: v1
kind: Service
metadata:
name: borecast-rabbitmq-service
namespace: borecast-production
labels:
app: borecast-rabbitmq
spec:
clusterIP: None
ports:
- port: 5672
name: amqp
selector:
app: borecast-rabbitmq
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: borecast-rabbitmq
namespace: borecast-production
spec:
serviceName: borecast-rabbitmq-service
replicas: 3
template:
metadata:
labels:
app: borecast-rabbitmq
spec:
serviceAccountName: borecast-rabbitmq
containers:
- image: docker.borecast.com/borecast-rabbitmq:v1.0.3
name: borecast-rabbitmq
imagePullPolicy: Always
resources:
requests:
memory: "256Mi"
cpu: "150m"
limits:
memory: "512Mi"
cpu: "250m"
ports:
- containerPort: 5672
name: amqp
env:
- name: RABBITMQ_DEFAULT_USER
valueFrom:
secretKeyRef:
name: rabbitmq-secret
key: username
- name: RABBITMQ_DEFAULT_PASS
valueFrom:
secretKeyRef:
name: rabbitmq-secret
key: password
- name: RABBITMQ_ERLANG_COOKIE
valueFrom:
secretKeyRef:
name: rabbitmq-secret
key: secretCookie
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: K8S_SERVICE_NAME
# value: borecast-rabbitmq-service.borecast-production.svc.cluster.local
value: borecast-rabbitmq-service
- name: RABBITMQ_USE_LONGNAME
value: "true"
- name: RABBITMQ_NODENAME
value: "rabbit#$(MY_POD_NAME).$(K8S_SERVICE_NAME)"
# value: rabbit#$(MY_POD_NAME).borecast-rabbitmq-service.borecast-production.svc.cluster.local
- name: RABBITMQ_NODE_TYPE
value: disc
- name: AUTOCLUSTER_TYPE
value: "k8s"
- name: AUTOCLUSTER_DELAY
value: "10"
- name: AUTOCLUSTER_CLEANUP
value: "true"
- name: CLEANUP_WARN_ONLY
value: "false"
- name: K8S_ADDRESS_TYPE
value: "hostname"
- name: K8S_HOSTNAME_SUFFIX
value: ".$(K8S_SERVICE_NAME)"
# value: .borecast-rabbitmq-service.borecast-production.svc.cluster.local
volumeMounts:
- name: rabbitmq-volume
mountPath: /var/lib/rabbitmq
imagePullSecrets:
- name: regcred
volumeClaimTemplates:
- metadata:
name: rabbitmq-volume
namespace: borecast-production
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: rabbitmq-sc
resources:
requests:
storage: 5Gi
Problem
Everything is working. However, when I access the management UI (i.e. I'm access the borecast-rabbitmq-management-service, port 15672), I only see one node showing up, when it should be three:
Also notice that the cluster name is
rabbit#borecast-rabbitmq-0.borecast-rabbitmq-service.borecast-production.svc.cluster.local
but when I log out and log in again, sometimes the number 0 will be changed to 1 or 2 for borecast-rabbitmq-0.
And also notice the node name is
rabbit#borecast-rabbitmq-1.borecast-rabbitmq-service
And you guessed it, sometimes the number is 2 or 0 for borecast-rabbitmq-1.
I have been trying to debug but to no avail. The logs for each pod doesn't raise any suspicions and every service and statefulset are working normally. I repeated the five steps multiple times, and if your cluster is on AWS, you can totally replicate my setup by following the steps (after creating the namespace borecast-production of course). If anybody can shed some light on the matter, I'll be eternally grateful.
The problem is with the headless service name definition:
- name: K8S_SERVICE_NAME
# value: borecast-rabbitmq-service.borecast-production.svc.cluster.local
value: borecast-rabbitmq-service
which is a building block of node name:
- name: RABBITMQ_NODENAME
value: "rabbit#$(MY_POD_NAME).$(K8S_SERVICE_NAME)"
whereas the proper node name, should be of FQDN of the POD (<statefulset name>-<ordinal index>.<headless_svc_name>.<namespace>.svc.cluster.local):
- name: RABBITMQ_NODENAME
value: "rabbit#$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local"
Therefore you ended up with NodeName
borecast-rabbitmq-1.borecast-rabbitmq-service
instead of:
borecast-rabbitmq-1.borecast-rabbitmq-service.borecast-production.svc.cluster.local
Look up the fqdn of the pod created by borecast-rabbitmq StatefulSet (in other word: SRV records of the Pods) with nslookup util from inside of your cluster as explained here, to see what form the RABBITMQ_NODENAME is expected to have.
try exposing 4369 for headless service;
https://www.rabbitmq.com/clustering.html
see the port access section
Had the same issue, and it came down to
Deleting all the rabbitmq resources including the pvc created under the statefulset.
Then reinstalling everything from the manifests.

Failed to connect to cloud SQL using proxy despite following the documentation

I'm using doctrine ORM with Symfony, the PHP framework. I'm getting bizarre behaviour when trying to connect to cloud SQL using GKE.
I'm able to get a connection to the DB via doctrine on command line, for example php bin/console doctrine:database:create is successful and I can see a connection opened in the proxy pod logs.
But when I try and connect to the DB via doctrine in my application I run into this error without fail:
An exception occurred in driver: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Name or service not known
I have been trying to get my head around this but it doesn't make sense, why would I be able to connect via command line but not in my application?
I followed the documentation here for setting up a db connection using cloud proxy. This is my Kubernetes deployment:
---
apiVersion: "extensions/v1beta1"
kind: "Deployment"
metadata:
name: "riptides-api"
namespace: "default"
labels:
app: "riptides-api"
microservice: "riptides"
spec:
replicas: 3
selector:
matchLabels:
app: "riptides-api"
microservice: "riptides"
template:
metadata:
labels:
app: "riptides-api"
microservice: "riptides"
spec:
containers:
- name: "api-sha256"
image: "eu.gcr.io/riptides/api#sha256:ce0ead9d1dd04d7bfc129998eca6efb58cb779f4f3e41dcc3681c9aac1156867"
env:
- name: DB_HOST
value: 127.0.0.1:3306
- name: DB_USER
valueFrom:
secretKeyRef:
name: riptides-mysql-user-skye
key: user
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: riptides-mysql-user-skye
key: password
- name: DB_NAME
value: riptides
lifecycle:
postStart:
exec:
command: ["/bin/bash", "-c", "php bin/console doctrine:migrations:migrate -n"]
volumeMounts:
- name: keys
mountPath: "/app/config/jwt"
readOnly: true
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=riptides:europe-west4:riptides-sql=tcp:3306",
"-credential_file=/secrets/cloudsql/credentials.json"]
# [START cloudsql_security_context]
securityContext:
runAsUser: 2 # non-root user
allowPrivilegeEscalation: false
# [END cloudsql_security_context]
volumeMounts:
- name: riptides-mysql-service-account
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: keys
secret:
secretName: riptides-api-keys
items:
- key: private.pem
path: private.pem
- key: public.pem
path: public.pem
- name: riptides-mysql-service-account
secret:
secretName: riptides-mysql-service-account
---
apiVersion: "autoscaling/v2beta1"
kind: "HorizontalPodAutoscaler"
metadata:
name: "riptides-api-hpa"
namespace: "default"
labels:
app: "riptides-api"
microservice: "riptides"
spec:
scaleTargetRef:
kind: "Deployment"
name: "riptides-api"
apiVersion: "apps/v1beta1"
minReplicas: 1
maxReplicas: 5
metrics:
- type: "Resource"
resource:
name: "cpu"
targetAverageUtilization: 70
If anyone has any suggestions I'd be forever greatful
It doesn't look like anything is wrong with your k8s yaml, but more likely in how you are connecting using Symfony. According to the documentation here, Symfony expect the DB URI to be passed in through an environment variable called "DATABASE_URL". See the following example:
# customize this line!
DATABASE_URL="postgres://db_user:db_password#127.0.0.1:5432/db_name"
This was happening because doctrine was using default values instead of the (should be overriding) environment variables I had set up in my deployment. I changed the environment variable names to be different to the default ones and it works

kubectl - cert manager - credentials not found

I want to have TLS termination enabled on ingress (on top of kubernetes) on google cloud platform.
My ingress cluster is working, my cert manager is failing with the error message
textPayload: "2018/07/05 22:04:00 Error while processing certificate during sync: Error while creating ACME client for 'domain': Error while initializing challenge provider googlecloud: Unable to get Google Cloud client: google: error getting credentials using GOOGLE_APPLICATION_CREDENTIALS environment variable: open /opt/google/kube-cert-manager.json: no such file or directory
"
This is what I did in order to get into the current state:
created cluster, deployment, service, ingress
executed:
gcloud --project 'project' iam service-accounts create kube-cert-manager-sv-security --display-name "kube-cert-manager-sv-security"
gcloud --project 'project' iam service-accounts keys create ~/.config/gcloud/kube-cert-manager-sv-security.json --iam-account kube-cert-manager-sv-security#'project'.iam.gserviceaccount.com
gcloud --project 'project' projects add-iam-policy-binding --member serviceAccount:kube-cert-manager-sv-security#'project'.iam.gserviceaccount.com --role roles/dns.admin
kubectl create secret generic kube-cert-manager-sv-security-secret --from-file=/home/perre/.config/gcloud/kube-cert-manager-sv-security.json
and created the following resources:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: kube-cert-manager-sv-security-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-cert-manager-sv-security
namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: kube-cert-manager-sv-security
rules:
- apiGroups: ["*"]
resources: ["certificates", "ingresses"]
verbs: ["get", "list", "watch"]
- apiGroups: ["*"]
resources: ["secrets"]
verbs: ["get", "list", "create", "update", "delete"]
- apiGroups: ["*"]
resources: ["events"]
verbs: ["create"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: kube-cert-manager-sv-security-service-account
subjects:
- kind: ServiceAccount
namespace: default
name: kube-cert-manager-sv-security
roleRef:
kind: ClusterRole
name: kube-cert-manager-sv-security
apiGroup: rbac.authorization.k8s.io
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: certificates.stable.k8s.psg.io
spec:
scope: Namespaced
group: stable.k8s.psg.io
version: v1
names:
kind: Certificate
plural: certificates
singular: certificate
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: kube-cert-manager-sv-security
name: kube-cert-manager-sv-security
spec:
replicas: 1
template:
metadata:
labels:
app: kube-cert-manager-sv-security
name: kube-cert-manager-sv-security
spec:
serviceAccount: kube-cert-manager-sv-security
containers:
- name: kube-cert-manager
env:
- name: GCE_PROJECT
value: solidair-vlaanderen-207315
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /opt/google/kube-cert-manager.json
image: bcawthra/kube-cert-manager:2017-12-10
args:
- "-data-dir=/var/lib/cert-manager-sv-security"
#- "-acme-url=https://acme-staging.api.letsencrypt.org/directory"
# NOTE: the URL above points to the staging server, where you won't get real certs.
# Uncomment the line below to use the production LetsEncrypt server:
- "-acme-url=https://acme-v01.api.letsencrypt.org/directory"
# You can run multiple instances of kube-cert-manager for the same namespace(s),
# each watching for a different value for the 'class' label
- "-class=kube-cert-manager"
# You can choose to monitor only some namespaces, otherwise all namespaces will be monitored
#- "-namespaces=default,test"
# If you set a default email, you can omit the field/annotation from Certificates/Ingresses
- "-default-email=viae.it#gmail.com"
# If you set a default provider, you can omit the field/annotation from Certificates/Ingresses
- "-default-provider=googlecloud"
volumeMounts:
- name: data-sv-security
mountPath: /var/lib/cert-manager-sv-security
- name: google-application-credentials
mountPath: /opt/google
volumes:
- name: data-sv-security
persistentVolumeClaim:
claimName: kube-cert-manager-sv-security-data
- name: google-application-credentials
secret:
secretName: kube-cert-manager-sv-security-secret
anyone knows what I'm missing?
Your secret resource kube-cert-manager-sv-security-secret may contains a JSON file named kube-cert-manager-sv-security.json and it is not matched with GOOGLE_APPLICATION_CREDENTIALS value. You can confirm file name in the secret resource with kubectl get secret -oyaml YOUR-SECRET-NAME.
So you change the file path to the actual file name, cert-manager works fine.
- name: GOOGLE_APPLICATION_CREDENTIALS
# value: /opt/google/kube-cert-manager.json
value: /opt/google/kube-cert-manager-sv-security.json