Connecting Airflow Operator with Google Cloud Bucket - google-cloud-platform

I am running an GKE Airflow Operator from market place and I have not idea as how to store my dags in google cloud bucket while deploying them in containers.
I tried the following, but it's not reflecting on the bucket
kubectl apply -f - <<EOF
apiVersion: airflow.k8s.io/v1alpha1
kind: AirflowCluster
metadata:
name: airflow-cluster
spec:
executor: Celery
config:
airflow:
AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 30 # default is 300s
redis:
operator: False
scheduler:
version: "1.10.0rc2"
ui:
replicas: 1
worker:
replicas: 2
dags:
subdir: "/dags"
gcs:
bucket: "gs://airflowdags"
airflowbase:
name: airflow-base
EOF

Related

AWS EKS secrets store provider - Failed to fetch secret from all regions: name

I am using the AWS secrets store CSI provider to sync secrets from the AWS Secret Manager into Kubernetes/EKS.
The SecretProviderClass is:
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: test-provider
spec:
provider: aws
parameters:
objects: |
- objectName: mysecret
objectType: secretsmanager
jmesPath:
- path: APP_ENV
objectAlias: APP_ENV
- path: APP_DEBUG
objectAlias: APP_DEBUG
And the Pod mounting these secrets is:
apiVersion: v1
kind: Pod
metadata:
name: secret-pod
spec:
restartPolicy: Never
serviceAccountName: my-account
terminationGracePeriodSeconds: 2
containers:
- name: dotfile-test-container
image: registry.k8s.io/busybox
volumeMounts:
- name: secret-volume
readOnly: true
mountPath: "/mnt/secret-volume"
volumes:
- name: secret-volume
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: test-provider
The secret exists in the Secret Provider:
{
"APP_ENV": "staging",
"APP_DEBUG": false
}
(this is an example, I am aware I do not need to store these particular variables as secrets)
But when I create the resources, the Pod fails to run with
Warning
FailedMount
96s (x10 over 5m47s)
kubelet
MountVolume.SetUp failed for volume "secret-volume" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod pace/secret-dotfiles-pod,
err: rpc error: code = Unknown desc = Failed to fetch secret from all regions: mysecret
Turns out the error message is very misleading. The problem in my case was due to the type of the APP_DEBUG value. Changing it from a boolean to string
fixed the problem and now the pod starts correctly.
{
"APP_ENV": "staging",
"APP_DEBUG": "false"
}
Seems like a bug in the provider to me.

EKS: can't see nodes and nodes are not join to the cluster

I read all aws articles. I followed each one by one. But it didn't work any of them. Let me briefly summarize my situation. I created EKS automation with terraform. 1 vpc, 3 public subnets, 3 private subnets, 3 security group, 1 nat gateway(on public), and 2 autoscaled worker node groups. I checked all infra which created with terraform. There are no problem.
My main problem is that after the installation I can't see the nodes and nodes are not join to the cluster. I applied below steps but didn't worked. What should I do? By the way don't tag my question as a duplication I checked all similar questions on stackoverflow. My steps look true but does not work.
kubectl get nodes
No resources found
Before checking node with above command.Firstly I applied below command for setting kubeconfig.
aws eks update-kubeconfig --name eks-DS7h --region us-east-1
Here my kubeconfig:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJfgzsfhadfzasdfrzsd.........
server: https://0F97E579A.gr7.us-east-1.eks.amazonaws.com
name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
contexts:
- context:
cluster: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
user: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
current-context: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
kind: Config
preferences: {}
users:
- name: arn:aws:eks:us-east-1:545153234644:cluster/eks-DS7h
user:
exec:
apiVersion: client.authentication.k8s.io/v1beta1
args:
- --region
- us-east-1
- eks
- get-token
- --cluster-name
- eks-DS7h
command: aws
After this I checked the nodes again but I still get no resource found. Than I try to edit aws-auth. Before the edit I check my user on the terminal where I triggered all terraform steps installation.
aws sts get-caller-identity
{
"UserId": "ASDFGSDFGDGSDGDFHSFDSDC",
"Account": "545153234644",
"Arn": "arn:aws:iam::545153234644:user/white"
}
I took my user info and I added blank mapuser area in aws-auth. But still getting No resources found.
kubectl get cm -n kube-system aws-auth
apiVersion: v1
data:
mapAccounts: |
[]
mapRoles: |
- "groups":
- "system:bootstrappers"
- "system:nodes"
- "system:masters"
"rolearn": "arn:aws:iam::545153234644:role/eks-DS7h22060508195731770000000e"
"username": "system:node:{{EC2PrivateDNSName}}"
mapUsers: "- \"userarn\": \"arn:aws:iam::545153234644:user/white\"\n \"username\":
\"white\"\n \"groups\":\n - \"system:masters\"\n - \"system:nodes\" \n"
kind: ConfigMap
metadata:
creationTimestamp: "2022-06-05T08:20:02Z"
labels:
app.kubernetes.io/managed-by: Terraform
terraform.io/module: terraform-aws-modules.eks.aws
name: aws-auth
namespace: kube-system
resourceVersion: "4976"
uid: b12341-33ff-4f78-af0a-758f88
Oh also when I check EKS cluster on dashboard I see below warning too. I don't know is it relevant or not. I want to share it too maybe it will help.

use AWS Secrets & Configuration Provider for EKS: Error from server (BadRequest)

I'm following this AWS documentation which explains how to properly configure AWS Secrets Manager to let it works with EKS through Kubernetes Secrets.
I successfully followed step by step all the different commands as explained in the documentation.
The only difference I get is related to this step where I have to run:
kubectl get po --namespace=kube-system
The expected output should be:
csi-secrets-store-qp9r8 3/3 Running 0 4m
csi-secrets-store-zrjt2 3/3 Running 0 4m
but instead I get:
csi-secrets-store-provider-aws-lxxcz 1/1 Running 0 5d17h
csi-secrets-store-provider-aws-rhnc6 1/1 Running 0 5d17h
csi-secrets-store-secrets-store-csi-driver-ml6jf 3/3 Running 0 5d18h
csi-secrets-store-secrets-store-csi-driver-r5cbk 3/3 Running 0 5d18h
As you can see the names are different, but I'm quite sure it's ok :-)
The real problem starts here in step 4: I created the following YAML file (as you ca see I added some parameters):
apiVersion: secrets-store.csi.x-k8s.io/v1alpha1
kind: SecretProviderClass
metadata:
name: aws-secrets
spec:
provider: aws
parameters:
objects: |
- objectName: "mysecret"
objectType: "secretsmanager"
And finally I created a deploy (as explain here in step 5) using the following yaml file:
# test-deployment.yaml
kind: Pod
apiVersion: v1
metadata:
name: nginx-secrets-store-inline
spec:
serviceAccountName: iamserviceaccountforkeyvaultsecretmanagerresearch
containers:
- image: nginx
name: nginx
volumeMounts:
- name: mysecret-volume
mountPath: "/mnt/secrets-store"
readOnly: true
volumes:
- name: mysecret-volume
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "aws-secrets"
After the deployment through the command:
kubectl apply -f test-deployment.yaml -n mynamespace
The pod is not able to start properly because the following error is generated:
Error from server (BadRequest): container "nginx" in pod "nginx-secrets-store-inline" is waiting to start: ContainerCreating
But, for example, if I run the deployment with the following yaml the POD will be successfully created
# test-deployment.yaml
kind: Pod
apiVersion: v1
metadata:
name: nginx-secrets-store-inline
spec:
serviceAccountName: iamserviceaccountforkeyvaultsecretmanagerresearch
containers:
- image: nginx
name: nginx
volumeMounts:
- name: keyvault-credential-volume
mountPath: "/mnt/secrets-store"
readOnly: true
volumes:
- name: keyvault-credential-volume
emptyDir: {} # <<== !! LOOK HERE !!
as you can see I used
emptyDir: {}
So as far I can see the problem here is related to the following YAML lines:
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "aws-secrets"
To be honest it's even not clear in my mind what's happing here.
Probably I didn't properly enabled the volume permission in EKS?
Sorry but I'm a newbie in both AWS and Kubernetes configurations.
Thanks for you time
--- NEW INFO ---
If I run
kubectl describe pod nginx-secrets-store-inline -n mynamespace
where nginx-secrets-store-inline is the name of the pod, I get the following output:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 30s default-scheduler Successfully assigned mynamespace/nginx-secrets-store-inline to ip-10-0-24-252.eu-central-1.compute.internal
Warning FailedMount 14s (x6 over 29s) kubelet MountVolume.SetUp failed for volume "keyvault-credential-volume" : rpc error: code = Unknown desc = failed to get secretproviderclass mynamespace/aws-secrets, error: SecretProviderClass.secrets-store.csi.x-k8s.io "aws-secrets" not found
Any hints?
Finally I realized why it wasn't working. As explained here, the error:
Warning FailedMount 3s (x4 over 6s) kubelet, kind-control-plane MountVolume.SetUp failed for volume "secrets-store-inline" : rpc error: code = Unknown desc = failed to get secretproviderclass default/azure, error: secretproviderclasses.secrets-store.csi.x-k8s.io "azure" not found
is related to namespace:
The SecretProviderClass being referenced in the volumeMount needs to exist in the same namespace as the application pod.
So both the yaml file should be deployed in the same namespace (adding, for example, the -n mynamespace argument).
Finally I got it working!

Fixing DataDog agent congestion issues in Amazon EKS cluster

A few months ago I integrated DataDog into my Kubernetes cluster by using a DaemonSet configuration. Since then I've been getting congestion alerts with the following message:
Please tune the hot-shots settings
https://github.com/brightcove/hot-shots#errors
By attempting to follow the docs with my limited Orchestration/DevOps knowledge, what I could gather is that I need to add the following to my DaemonSet config:
spec
.
.
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen
value: "1024"
- name: net.core.wmem_max
value: "4194304"
I attempted to add that configuration piece to one of the auto-deployed DataDog pods directly just to try it out but it hangs indefinitely and doesn't save the configuration (Instead of adding to DaemonSet and risking bringing all agents down).
That hot-shots documentation also mentions that the above sysctl configuration requires unsafe sysctls to be enabled in the nodes that contain the pods:
kubelet --allowed-unsafe-sysctls \
'net.unix.max_dgram_qlen, net.core.wmem_max'
The cluster I am working with is fully deployed with EKS by using the Dashboard in AWS (Little knowledge on how it is configured). The above seems to be indicated for manually deployed and managed cluster.
Why is the configuration I am attempting to apply to a single DataDog agent pod not saving/applying? Is it because it is managed by DaemonSet or is it because it doesn't have the proper unsafe sysctl allowed? Something else?
If I do need to enable the suggested unsafe sysctlon all nodes of my cluster. How do I go about it since the cluster is fully deployed and managed by Amazon EKS?
So we managed to achieve this using a custom launch template with our managed node group and then passing in a custom bootstrap script. This does mean however you need to supply the AMI id yourself and lose the alerts in the console when it is outdated. In Terraform this would look like:
resource "aws_eks_node_group" "group" {
...
launch_template {
id = aws_launch_template.nodes.id
version = aws_launch_template.nodes.latest_version
}
...
}
data "template_file" "bootstrap" {
template = file("${path.module}/files/bootstrap.tpl")
vars = {
cluster_name = aws_eks_cluster.cluster.name
cluster_auth_base64 = aws_eks_cluster.cluster.certificate_authority.0.data
endpoint = aws_eks_cluster.cluster.endpoint
}
}
data "aws_ami" "eks_node" {
owners = ["602401143452"]
most_recent = true
filter {
name = "name"
values = ["amazon-eks-node-1.21-v20211008"]
}
}
resource "aws_launch_template" "nodes" {
...
image_id = data.aws_ami.eks_node.id
user_data = base64encode(data.template_file.bootstrap.rendered)
...
}
Then the bootstrap.hcl file looks like this:
#!/bin/bash
set -o xtrace
systemctl stop kubelet
/etc/eks/bootstrap.sh '${cluster_name}' \
--b64-cluster-ca '${cluster_auth_base64}' \
--apiserver-endpoint '${endpoint}' \
--kubelet-extra-args '"--allowed-unsafe-sysctls=net.unix.max_dgram_qlen"'
The next step is to set up the PodSecurityPolicy, ClusterRole and RoleBinding in your cluster so you can use the securityContext as you described above and then pods in that namespace will be able to run without a SysctlForbidden message.
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: sysctl
spec:
allowPrivilegeEscalation: false
allowedUnsafeSysctls:
- net.unix.max_dgram_qlen
defaultAllowPrivilegeEscalation: false
fsGroup:
rule: RunAsAny
runAsUser:
rule: RunAsAny
seLinux:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
volumes:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: allow-sysctl
rules:
- apiGroups:
- policy
resourceNames:
- sysctl
resources:
- podsecuritypolicies
verbs:
- '*'
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: allow-sysctl
namespace: app-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: allow-sysctl
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:serviceaccounts:app-namespace
If using the DataDog Helm chart you can set the following values to update the securityContext of the agent. But you will have to update the chart PSP manually to set allowedUnsafeSysctls
datadog:
securityContext:
sysctls:
- name: net.unix.max_dgram_qlen"
value: 512"

K8S Unable to mount AWS EBS as a persistent volume for pod

Question
Please suggest the cause of the error of not being able to mount AWS EBS volume in pod.
journalctl -b -f -u kubelet
1480 kubelet.go:1625] Unable to mount volumes for pod "nginx_default(ddc938ee-edda-11e7-ae06-06bb783bb15c)": timeout expired waiting for volumes to attach/mount for pod "default"/"nginx". list of unattached/unmounted volumes=[ebs]; skipping pod
1480 pod_workers.go:186] Error syncing pod ddc938ee-edda-11e7-ae06-06bb783bb15c ("nginx_default(ddc938ee-edda-11e7-ae06-06bb783bb15c)"), skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"nginx". list of unattached/unmounted volumes=[ebs]
1480 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pv-ebs" (UniqueName: "kubernetes.io/aws-ebs/vol-0d275986ce24f4304") pod "nginx" (UID: "ddc938ee-edda-11e7-ae06-06bb783bb15c")
1480 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/aws-ebs/vol-0d275986ce24f4304\"" failed. No retries permitted until 2017-12-31 03:34:03.644604131 +0000 UTC m=+6842.543441523 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"pv-ebs\" (UniqueName: \"kubernetes.io/aws-ebs/vol-0d275986ce24f4304\") pod \"nginx\" (UID: \"ddc938ee-edda-11e7-ae06-06bb783bb15c\") "
Steps
Deployed K8S 1.9 using kubeadm (without EBS volume mount, pods work) in AWS (us-west-1 and AZ is us-west-1b).
Configure an IAM role as per Kubernetes - Cloud Providers and kubelets failing to start when using 'aws' as cloud provider.
Assign the IAM role to EC2 instances as per Easily Replace or Attach an IAM Role to an Existing EC2 Instance by Using the EC2 Console.
Deploy PV/PVC/POD as in the manifest.
The status from the kubectl:
kubectl get
NAME READY STATUS RESTARTS AGE IP NODE
nginx 0/1 ContainerCreating 0 29m <none> ip-172-31-1-43.us-west-1.compute.internal
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv/pv-ebs 5Gi RWO Recycle Bound default/pvc-ebs 33m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc/pvc-ebs Bound pv-ebs 5Gi RWO 33m
kubectl describe pod nginx
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned nginx to ip-172-31-1-43.us-west-1.compute.internal
Normal SuccessfulMountVolume 27m kubelet, ip-172-31-1-43.us-west-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-dt698"
Warning FailedMount 6s (x12 over 25m) kubelet, ip-172-31-1-43.us-west-1.compute.internal Unable to mount volumes for pod "nginx_default(ddc938ee-edda-11e7-ae06-06bb783bb15c)": timeout expired waiting for volumes to attach/mount for pod "default"/"nginx". Warning FailedMount 6s (x12 over 25m) kubelet, ip-172-31-1-43.us-west-1.compute.internal Unable to mount volumes for pod "nginx_default(ddc938ee-edda-11e7-ae06-06bb783bb15c)": timeout expired waiting for volumes to attach/mount for pod "default"/"nginx".
Manifest
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv-ebs
labels:
type: amazonEBS
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
volumeID: vol-0d275986ce24f4304
fsType: ext4
persistentVolumeReclaimPolicy: Recycle
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-ebs
labels:
type: amazonEBS
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
kind: Pod
apiVersion: v1
metadata:
name: nginx
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: ebs
volumes:
- name: ebs
persistentVolumeClaim:
claimName: pvc-ebs
IAM Policy
Environment
$ kubectl version -o json
{
"clientVersion": {
"major": "1",
"minor": "9",
"gitVersion": "v1.9.0",
"gitCommit": "925c127ec6b946659ad0fd596fa959be43f0cc05",
"gitTreeState": "clean",
"buildDate": "2017-12-15T21:07:38Z",
"goVersion": "go1.9.2",
"compiler": "gc",
"platform": "linux/amd64"
},
"serverVersion": {
"major": "1",
"minor": "9",
"gitVersion": "v1.9.0",
"gitCommit": "925c127ec6b946659ad0fd596fa959be43f0cc05",
"gitTreeState": "clean",
"buildDate": "2017-12-15T20:55:30Z",
"goVersion": "go1.9.2",
"compiler": "gc",
"platform": "linux/amd64"
}
}
$ cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)
EC2
EBS
Solution
Found the documentation which shows how to configure AWS cloud provider.
K8S AWS Cloud Provider Notes
Steps
Tag EC2 instances and SG with the KubernetesCluster=${kubernetes cluster name}. If created with kubeadm, it is kubernetes as in Ability to configure user and cluster name in AdminKubeConfigFile
Run kubeadm init --config kubeadm.yaml.
kubeadm.yaml (Ansible template)
kind: MasterConfiguration
apiVersion: kubeadm.k8s.io/v1alpha1
api:
advertiseAddress: {{ K8S_ADVERTISE_ADDRESS }}
networking:
podSubnet: {{ K8S_SERVICE_ADDRESSES }}
cloudProvider: {{ K8S_CLOUD_PROVIDER }}
Result
$ journalctl -b -f CONTAINER_ID=$(docker ps | grep k8s_kube-controller-manager | awk '{ print $1 }')
Jan 02 04:48:28 ip-172-31-4-117.us-west-1.compute.internal dockerd-current[8063]: I0102 04:48:28.752141
1 reconciler.go:287] attacherDetacher.AttachVolume started for volume "kuard-pv" (UniqueName: "kubernetes.io/aws-ebs/vol-0d275986ce24f4304") from node "ip-172-3
Jan 02 04:48:39 ip-172-31-4-117.us-west-1.compute.internal dockerd-current[8063]: I0102 04:48:39.309178
1 operation_generator.go:308] AttachVolume.Attach succeeded for volume "kuard-pv" (UniqueName: "kubernetes.io/aws-ebs/vol-0d275986ce24f4304") from node "ip-172-
$ kubectl describe pod kuard
...
Volumes:
kuard-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kuard-pvc
ReadOnly: false
$ kubectl describe pv kuard-pv
Name: kuard-pv
Labels: failure-domain.beta.kubernetes.io/region=us-west-1
failure-domain.beta.kubernetes.io/zone=us-west-1b
type=amazonEBS
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"PersistentVolume","metadata":{"annotations":{},"labels":{"type":"amazonEBS"},"name":"kuard-pv","namespace":""},"spec":{"acce...
pv.kubernetes.io/bound-by-controller=yes
StorageClass:
Status: Bound
Claim: default/kuard-pvc
Reclaim Policy: Retain
Access Modes: RWO
Capacity: 5Gi
Message:
Source:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: vol-0d275986ce24f4304
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
$ kubectl version -o json
{
"clientVersion": {
"major": "1",
"minor": "9",
"gitVersion": "v1.9.0",
"gitCommit": "925c127ec6b946659ad0fd596fa959be43f0cc05",
"gitTreeState": "clean",
"buildDate": "2017-12-15T21:07:38Z",
"goVersion": "go1.9.2",
"compiler": "gc",
"platform": "linux/amd64"
},
"serverVersion": {
"major": "1",
"minor": "9",
"gitVersion": "v1.9.0",
"gitCommit": "925c127ec6b946659ad0fd596fa959be43f0cc05",
"gitTreeState": "clean",
"buildDate": "2017-12-15T20:55:30Z",
"goVersion": "go1.9.2",
"compiler": "gc",
"platform": "linux/amd64"
}
}