kops 'protectKernelDefaults' flag and 'EventRateLimit' admission plugin not working - amazon-web-services

I am trying to implement some of the CIS security benchmark advices to kubernetes version 1.21.4 via kOps(1.21.0) for a self hosted Kubernetes on aws.
However when i try protectKernelDefaults:true in kubelet config and EventRateLimit adminssion plugin kube api server config, the k8s cluster fails to come up.
I am trying bring up a new cluster with these settings not trying to update any existing ones.
kops cluster yaml that i am trying to use is
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
name: k8s.sample.com
spec:
cloudLabels:
team_number: "0"
environment: "dev"
api:
loadBalancer:
type: Internal
additionalSecurityGroups:
- sg-id
crossZoneLoadBalancing: false
dns: { }
authorization:
rbac: { }
channel: stable
cloudProvider: aws
configBase: s3://state-data/k8s.sample.com
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- encryptedVolume: true
instanceGroup: master-eu-west-3a
name: a
memoryRequest: 100Mi
name: main
env:
- name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
value: 2d
- name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
value: 1m
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8081
- name: ETCD_METRICS
value: basic
- cpuRequest: 100m
etcdMembers:
- encryptedVolume: true
instanceGroup: master-eu-west-3a
name: a
memoryRequest: 100Mi
name: events
env:
- name: ETCD_MANAGER_HOURLY_BACKUPS_RETENTION
value: 2d
- name: ETCD_MANAGER_DAILY_BACKUPS_RETENTION
value: 1m
- name: ETCD_LISTEN_METRICS_URLS
value: http://0.0.0.0:8081
- name: ETCD_METRICS
value: basic
iam:
allowContainerRegistry: true
legacy: false
kubeControllerManager:
enableProfiling: false
logFormat: json
kubeScheduler:
logFormat: json
enableProfiling: false
kubelet:
anonymousAuth: false
logFormat: json
protectKernelDefaults: true
tlsCipherSuites: [ TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256 ]
kubeAPIServer:
auditLogMaxAge: 7
auditLogMaxBackups: 1
auditLogMaxSize: 25
auditLogPath: /var/log/kube-apiserver-audit.log
auditPolicyFile: /srv/kubernetes/audit/policy-config.yaml
enableProfiling: false
logFormat: json
enableAdmissionPlugins:
- NamespaceLifecycle
- LimitRanger
- ServiceAccount
- PersistentVolumeLabel
- DefaultStorageClass
- DefaultTolerationSeconds
- MutatingAdmissionWebhook
- ValidatingAdmissionWebhook
- NodeRestriction
- ResourceQuota
- AlwaysPullImages
- EventRateLimit
- SecurityContextDeny
fileAssets:
- name: audit-policy-config
path: /srv/kubernetes/audit/policy-config.yaml
roles:
- Master
content: |
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
kubernetesVersion: 1.21.4
masterPublicName: api.k8s.sample.com
networkID: vpc-id
sshKeyName: node_key
networking:
calico:
crossSubnet: true
nonMasqueradeCIDR: 100.64.0.0/10
subnets:
- id: subnet-id1
name: sn_nodes_1
type: Private
zone: eu-west-3a
- id: subnet-id2
name: sn_nodes_2
type: Private
zone: eu-west-3a
- id: subnet-id3
name: sn_utility_1
type: Utility
zone: eu-west-3a
- id: subnet-id4
name: sn_utility_2
type: Utility
zone: eu-west-3a
topology:
dns:
type: Private
masters: private
nodes: private
additionalPolicies:
node: |
[
{
"Effect": "Allow",
"Action": [
"kms:CreateGrant",
"kms:Decrypt",
"kms:DescribeKey",
"kms:Encrypt",
"kms:GenerateDataKey*",
"kms:ReEncrypt*"
],
"Resource": [
"arn:aws:kms:region:xxxx:key/s3access"
]
}
]
master: |
[
{
"Effect": "Allow",
"Action": [
"kms:CreateGrant",
"kms:Decrypt",
"kms:DescribeKey",
"kms:Encrypt",
"kms:GenerateDataKey*",
"kms:ReEncrypt*"
],
"Resource": [
"arn:aws:kms:region:xxxx:key/s3access"
]
}
]
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: k8s.sample.com
name: master-eu-west-3a
spec:
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210720
machineType: t3.medium
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: master-eu-west-3a
role: Master
subnets:
- sn_nodes_1
- sn_nodes_2
detailedInstanceMonitoring: false
additionalSecurityGroups:
- sg-id
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: k8s.sample.com
name: nodes-eu-west-3a
spec:
image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20210720
machineType: t3.large
maxSize: 1
minSize: 1
nodeLabels:
kops.k8s.io/instancegroup: nodes-eu-west-3a
role: Node
subnets:
- sn_nodes_1
- sn_nodes_2
detailedInstanceMonitoring: false
additionalSecurityGroups:
- sg-id
** Note: I have made some changes to values above to remove some specific details **
I have tried these protectKernelDefaults & EventRateLimit settings seperately and tried to bring up the cluster. And it doesnt work in those cases as well.
when I try protectKernelDefaults and ssh to master node and check the /var/log directory kube-scheduler.log, kube-proxy.log, kube-controller-manager.log and kube-apiserver.log are empty.
and when it try EventRateLimit and ssh to master node and check the /var/log directory the api server fails to come up and all the other log files has failures stating unable to connect to api server.
kube-apiserver.log contains the following
Log file created at: 2021/08/23 05:35:51
Running on machine: ip-10-100-120-9
Binary: Built with gc go1.16.7 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
Log file created at: 2021/08/23 05:35:54
Running on machine: ip-10-100-120-9
Binary: Built with gc go1.16.7 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
Log file created at: 2021/08/23 05:36:11
Running on machine: ip-10-100-120-9
Binary: Built with gc go1.16.7 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
Log file created at: 2021/08/23 05:36:32
Running on machine: ip-10-100-120-9
Binary: Built with gc go1.16.7 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0823 05:36:32.654990 1 flags.go:59] FLAG: --add-dir-header="false"
Log file created at: 2021/08/23 05:37:15
Running on machine: ip-10-100-120-9
Binary: Built with gc go1.16.7 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
Log file created at: 2021/08/23 05:38:44
Running on machine: ip-10-100-120-9
Binary: Built with gc go1.16.7 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
Log file created at: 2021/08/23 05:41:35
Running on machine: ip-10-100-120-9
Binary: Built with gc go1.16.7 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
Log file created at: 2021/08/23 05:46:47
Running on machine: ip-10-100-120-9
Binary: Built with gc go1.16.7 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
Log file created at: 2021/08/23 05:51:57
Running on machine: ip-10-100-120-9
Binary: Built with gc go1.16.7 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
Log file created at: 2021/08/23 05:56:59
Running on machine: ip-10-100-120-9
Binary: Built with gc go1.16.7 for linux/amd64
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
Any pointers to what is happening would help. Thanks in advance.

The issue with default kernel settings was a bug in kOps. The installed did not set the sysctl settings that kubelet expects.
The issue with the admission controller is simply a missing admission controller configuration file.

Related

Data Prepper Pipelines + OpenSearch Trace Analytics

I'm using the latest version of AWS OpenSearch but somehow, when I'm trying to go to the Trace analytics Dashboard it does not show the traces sent by the Data Prepper.
Manual OpenTelemetry instrumented application
Data Prepper is running in a Docker (opensearchproject/data-prepper:latest)
OpenSearch is running on the latest version
Sample Configuration
data-prepper-config.yaml
ssl: false
pipelines.yaml
entry-pipeline:
delay: "100"
source:
otel_trace_source:
ssl: false
sink:
- pipeline:
name: "raw-pipeline"
- pipeline:
name: "service-map-pipeline"
raw-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
processor:
- otel_trace_raw:
sink:
- opensearch:
hosts: [ "https://opensearch-domain" ]
username: "admin"
password: "admin"
index_type: trace-analytics-raw
service-map-pipeline:
delay: "100"
source:
pipeline:
name: "entry-pipeline"
processor:
- service_map_stateful:
sink:
- opensearch:
hosts: ["https://opensearch-domain"]
username: "admin"
password: "admin"
index_type: trace-analytics-service-map
remote-collector.yaml
...
exporters:
otlp/data-prepper:
endpoint: data-prepper-address:21890
service:
pipelines:
traces:
receivers: [otlp]
exporters: [otlp/data-prepper]
When I try to go to the Query Workbench and run the query SELECT * FROM otel-v1-apm-span, I'm getting the list of received trace spans. But I'm unable to see a chart or something on the Trace Analytics Dashboard (both Traces and Services). It's just an empty dashboard.
I'm also getting a warning:
WARN org.opensearch.dataprepper.plugins.processor.oteltrace.OTelTraceRawProcessor - Missing trace group for SpanId: xxxxxxxxxxxx
The traceGroupFields are also empty.
"traceGroupFields": {
"endTime": null,
"durationInNanos": null,
"statusCode": null
}
Is there something wrong with my setup? Any help is appreciated.

Merge kubectl config into ~/.kube/config

To use kubectl(to talk to kubernetes api-server) with merged config,
below two commands can be used to merge kubeConfig:
KUBECONFIG=~/.kube/config:/path/to/another/config.yml kubectl config view --flatten > ~/.kube/config-new.yaml
and then
cp ~/.kube/config-new.yaml ~/.kube/config
after running above two commands, ~/.kube/config will have merged config
But, In our case, instead of using kubectl, we are planning for a GoLang tool that uses client-go library, as part of automation.
GoLang tool maintains kubeconfig retrieved from MongoDB(every 10 minutes) and stored in []map[string]string(shown below):
[
{
"name" : "cluster-1-in-gcp",
"kubernetes-version": "1.16",
"server": "https://192.168.10.190:6443",
"user": "kubernetes-admin-1",
"client-certificate": "sadfhdsjfkhsdjfklhsdjfkassdfsd",
"client-key": "sahgjkshfgjkdf",
},
{
"name" : "cluster-1-in-aws",
"kubernetes-version": "1.17",
"server": "https://192.168.11.191:6443",
"user": "kubernetes-admin-2",
"client-certificate": "ssssshdsjfkhsdjfklhsdjfkassdfsd",
"client-key": "pppppsahgjkshfgjkdf",
},
{
"name" : "cluster-1-in-aks",
"kubernetes-version": "1.18",
"server": "https://192.168.11.192:6443",
"user": "kubernetes-admin-3",
"client-certificate": "oooossssshdsjfkhsdjfklhsdjfkassdfsd",
"client-key": "tttttpppppsahgjkshfgjkdf",
},
]
client-certificate & client-key are PEM format certificates(stored as string)
Certificates will be stored as(shown below):
apiVersion: v1
clusters:
- cluster:
server: https://192.168.10.190:6443
name: cluster-1
- cluster:
server: https://192.168.99.101:8443
name: cluster-2
contexts:
- context:
cluster: cluster-1
user: kubernetes-admin-1
name: cluster-1
- context:
cluster: cluster-2
user: kubernetes-admin-2
name: cluster-2
kind: Config
preferences: {}
users:
- name: kubernetes-admin-1
user:
client-certificate: /home/user/.minikube/credential-for-cluster-1.crt
client-key: /home/user/.minikube/credential-for-cluster-1.key
- name: kubernetes-admin-2
user:
client-certificate: /home/user/.minikube/credential-for-cluster-2.crt
client-key: /home/user/.minikube/credential-for-cluster-2.key
Seeking a Go package to merge this kubeConfig([]map[string]string) to ~/.kube/config
Does cliendcmd package of client-go library support this merge functionality?

How to connect from a public GKE pod to a GCP Cloud SQL using a private connection

I have a Java application running in a docker container. I am deploying all this to my GKE cluster. I'd like to have it connected to a CloudSQL instance via private IP. However I struggle for two days now to get it working. I followed the following guide:
https://cloud.google.com/sql/docs/mysql/configure-private-services-access#gcloud_1
I managed to create a private service connection and also gave my CloudSQL instance the address range. As far as I understood this should be sufficient for the Pod to be able to connect to the CloudSQL instance.
However it just does not work. I pass the private IP from CloudSQL as the host for the Java application JDBC (Database) connection.
│ 2022-02-14 22:03:31.299 WARN 1 --- [ main] o.h.e.j.e.i.JdbcEnvironmentInitiator : HHH000342: Could not obtain connection to query metadata │
│ │
│ com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
Here are some details to the problem.
The address allocation
➜ google-cloud-sdk gcloud compute addresses list
NAME ADDRESS/RANGE TYPE PURPOSE NETWORK REGION SUBNET STATUS
google-managed-services-default 10.77.0.0/16 INTERNAL VPC_PEERING default RESERVED
The vpc peering connection
➜ google-cloud-sdk gcloud services vpc-peerings list --network=default
---
network: projects/1071923183712/global/networks/default
peering: servicenetworking-googleapis-com
reservedPeeringRanges:
- google-managed-services-default
service: services/servicenetworking.googleapis.com
Here is my CloudSQL info. Please not that the PRIVATE IP Address is 10.77.0.5 and therefore matches the address range from above 10.77.0.0/16. I guess this part is working.
➜ google-cloud-sdk gcloud sql instances describe alpha-3
backendType: SECOND_GEN
connectionName: barbarus:europe-west4:alpha-3
createTime: '2022-02-14T19:28:02.465Z'
databaseInstalledVersion: MYSQL_5_7_36
databaseVersion: MYSQL_5_7
etag: 758de240b161b946689e5732d8e71d396c772c0e03904c46af3b61f59b1038a0
gceZone: europe-west4-a
instanceType: CLOUD_SQL_INSTANCE
ipAddresses:
- ipAddress: 34.90.174.243
type: PRIMARY
- ipAddress: 10.77.0.5
type: PRIVATE
kind: sql#instance
name: alpha-3
project: barbarus
region: europe-west4
selfLink: https://sqladmin.googleapis.com/sql/v1beta4/projects/barbarus/instances/alpha-3
serverCaCert:
cert: |-
-----BEGIN CERTIFICATE-----
//...
-----END CERTIFICATE-----
certSerialNumber: '0'
commonName: C=US,O=Google\, Inc,CN=Google Cloud SQL Server CA,dnQualifier=d495898b-f6c7-4e2f-9c59-c02ccf2c1395
createTime: '2022-02-14T19:29:35.325Z'
expirationTime: '2032-02-12T19:30:35.325Z'
instance: alpha-3
kind: sql#sslCert
sha1Fingerprint: 3ee799b139bf335ef39554b07a5027c9319087cb
serviceAccountEmailAddress: p1071923183712-d99fsz#gcp-sa-cloud-sql.iam.gserviceaccount.com
settings:
activationPolicy: ALWAYS
availabilityType: ZONAL
backupConfiguration:
backupRetentionSettings:
retainedBackups: 7
retentionUnit: COUNT
binaryLogEnabled: true
enabled: true
kind: sql#backupConfiguration
location: us
startTime: 12:00
transactionLogRetentionDays: 7
dataDiskSizeGb: '10'
dataDiskType: PD_HDD
ipConfiguration:
allocatedIpRange: google-managed-services-default
ipv4Enabled: true
privateNetwork: projects/barbarus/global/networks/default
requireSsl: false
kind: sql#settings
locationPreference:
kind: sql#locationPreference
zone: europe-west4-a
pricingPlan: PER_USE
replicationType: SYNCHRONOUS
settingsVersion: '1'
storageAutoResize: true
storageAutoResizeLimit: '0'
tier: db-f1-micro
state: RUNNABLE
The problem I see is with the Pod's IP Address. It is 10.0.5.3 and that is not in the range of 10.77.0.0/16 and therefore the pod can't see the CloudSQL instance.
See here is the Pod's info:
Name: game-server-5b9dd47cbd-vt2gw
Namespace: default
Priority: 0
Node: gke-barbarus-node-pool-1a5ea7d5-bg3m/10.164.15.216
Start Time: Tue, 15 Feb 2022 00:33:56 +0100
Labels: app=game-server
app.kubernetes.io/managed-by=gcp-cloud-build-deploy
pod-template-hash=5b9dd47cbd
Annotations: <none>
Status: Running
IP: 10.0.5.3
IPs:
IP: 10.0.5.3
Controlled By: ReplicaSet/game-server-5b9dd47cbd
Containers:
game-server:
Container ID: containerd://57d9540b1e5f5cb3fcc4517fa42377282943d292ba810c83cd7eb50bd4f1e3dd
Image: eu.gcr.io/barbarus/game-server#sha256:72d518a53652d32d0d438d2a5443c44cc8e12bb15cb1a59c843ce72466900141
Image ID: eu.gcr.io/barbarus/game-server#sha256:72d518a53652d32d0d438d2a5443c44cc8e12bb15cb1a59c843ce72466900141
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 15 Feb 2022 00:36:48 +0100
Finished: Tue, 15 Feb 2022 00:38:01 +0100
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 15 Feb 2022 00:35:23 +0100
Finished: Tue, 15 Feb 2022 00:36:35 +0100
Ready: False
Restart Count: 2
Environment:
SQL_CONNECTION: <set to the key 'SQL_CONNECTION' of config map 'game-server'> Optional: false
SQL_USER: <set to the key 'SQL_USER' of config map 'game-server'> Optional: false
SQL_DATABASE: <set to the key 'SQL_DATABASE' of config map 'game-server'> Optional: false
SQL_PASSWORD: <set to the key 'SQL_PASSWORD' of config map 'game-server'> Optional: false
LOG_LEVEL: <set to the key 'LOG_LEVEL' of config map 'game-server'> Optional: false
WORLD_ID: <set to the key 'WORLD_ID' of config map 'game-server'> Optional: false
WORLD_SIZE: <set to the key 'WORLD_SIZE' of config map 'game-server'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sknlk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-sknlk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m7s default-scheduler Successfully assigned default/game-server-5b9dd47cbd-vt2gw to gke-barbarus-node-pool-1a5ea7d5-bg3m
Normal Pulling 4m6s kubelet Pulling image "eu.gcr.io/barbarus/game-server#sha256:72d518a53652d32d0d438d2a5443c44cc8e12bb15cb1a59c843ce72466900141"
Normal Pulled 3m55s kubelet Successfully pulled image "eu.gcr.io/barbarus/game-server#sha256:72d518a53652d32d0d438d2a5443c44cc8e12bb15cb1a59c843ce72466900141" in 11.09487284s
Normal Created 75s (x3 over 3m54s) kubelet Created container game-server
Normal Started 75s (x3 over 3m54s) kubelet Started container game-server
Normal Pulled 75s (x2 over 2m41s) kubelet Container image "eu.gcr.io/barbarus/game-server#sha256:72d518a53652d32d0d438d2a5443c44cc8e12bb15cb1a59c843ce72466900141" already present on machine
Warning BackOff 1s (x2 over 87s) kubelet Back-off restarting failed container
Finally this is what gcloud container clusters describe gives me:
➜ google-cloud-sdk gcloud container clusters describe --region=europe-west4 barbarus
addonsConfig:
gcePersistentDiskCsiDriverConfig:
enabled: true
kubernetesDashboard:
disabled: true
networkPolicyConfig:
disabled: true
autopilot: {}
autoscaling:
autoscalingProfile: BALANCED
binaryAuthorization: {}
clusterIpv4Cidr: 10.0.0.0/14
createTime: '2022-02-14T19:34:03+00:00'
currentMasterVersion: 1.21.6-gke.1500
currentNodeCount: 3
currentNodeVersion: 1.21.6-gke.1500
databaseEncryption:
state: DECRYPTED
endpoint: 34.141.141.150
id: 39e7249b48c24d23a8b70b0c11cd18901565336b397147dab4778dc75dfc34e2
initialClusterVersion: 1.21.6-gke.1500
initialNodeCount: 1
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-a/instanceGroupManagers/gke-barbarus-node-pool-e291e3d6-grp
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-b/instanceGroupManagers/gke-barbarus-node-pool-5aa35c39-grp
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-c/instanceGroupManagers/gke-barbarus-node-pool-380645b7-grp
ipAllocationPolicy:
useRoutes: true
labelFingerprint: a9dc16a7
legacyAbac: {}
location: europe-west4
locations:
- europe-west4-a
- europe-west4-b
- europe-west4-c
loggingConfig:
componentConfig:
enableComponents:
- SYSTEM_COMPONENTS
- WORKLOADS
loggingService: logging.googleapis.com/kubernetes
maintenancePolicy:
resourceVersion: e3b0c442
masterAuth:
clusterCaCertificate: // ...
masterAuthorizedNetworksConfig: {}
monitoringConfig:
componentConfig:
enableComponents:
- SYSTEM_COMPONENTS
monitoringService: monitoring.googleapis.com/kubernetes
name: barbarus
network: default
networkConfig:
defaultSnatStatus: {}
network: projects/barbarus/global/networks/default
serviceExternalIpsConfig: {}
subnetwork: projects/barbarus/regions/europe-west4/subnetworks/default
nodeConfig:
diskSizeGb: 100
diskType: pd-standard
imageType: COS_CONTAINERD
machineType: e2-medium
metadata:
disable-legacy-endpoints: 'true'
oauthScopes:
- https://www.googleapis.com/auth/cloud-platform
preemptible: true
serviceAccount: default#barbarus.iam.gserviceaccount.com
shieldedInstanceConfig:
enableIntegrityMonitoring: true
nodeIpv4CidrSize: 24
nodePoolDefaults:
nodeConfigDefaults: {}
nodePools:
- config:
diskSizeGb: 100
diskType: pd-standard
imageType: COS_CONTAINERD
machineType: e2-medium
metadata:
disable-legacy-endpoints: 'true'
oauthScopes:
- https://www.googleapis.com/auth/cloud-platform
preemptible: true
serviceAccount: default#barbarus.iam.gserviceaccount.com
shieldedInstanceConfig:
enableIntegrityMonitoring: true
initialNodeCount: 1
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-a/instanceGroupManagers/gke-barbarus-node-pool-e291e3d6-grp
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-b/instanceGroupManagers/gke-barbarus-node-pool-5aa35c39-grp
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-c/instanceGroupManagers/gke-barbarus-node-pool-380645b7-grp
locations:
- europe-west4-a
- europe-west4-b
- europe-west4-c
management:
autoRepair: true
autoUpgrade: true
name: node-pool
podIpv4CidrSize: 24
selfLink: https://container.googleapis.com/v1/projects/barbarus/locations/europe-west4/clusters/barbarus/nodePools/node-pool
status: RUNNING
upgradeSettings:
maxSurge: 1
version: 1.21.6-gke.1500
notificationConfig:
pubsub: {}
releaseChannel:
channel: REGULAR
selfLink: https://container.googleapis.com/v1/projects/barbarus/locations/europe-west4/clusters/barbarus
servicesIpv4Cidr: 10.3.240.0/20
shieldedNodes:
enabled: true
status: RUNNING
subnetwork: default
zone: europe-west4
I have no idea how I can give the pod a reference to the address allocation I made for the private service connection.
What I tried is to spin up a GKE cluster with a Cluster default pod address range of 10.77.0.0/16 which sounded logical since I want the pods to appear in the same address range as the CloudSQL. However GCP gives me an error when I try to do that:
(1) insufficient regional quota to satisfy request: resource "CPUS": request requires '9.0' and is short '1.0'. project has a quota of '8.0' with '8.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=hait-barbarus (2) insufficient regional quota to satisfy request: resource "IN_USE_ADDRESSES": request requires '9.0' and is short '5.0'. project has a quota of '4.0' with '4.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=hait-barbarus.
So I am not able to give the pods the proper address range for the private service connection how can they ever discover the CloudSQL instance?
EDIT #1: The GKE cluster's service account has the SQL Client role.

CrashLoopBackOff Error when deploying Django app on GKE (Kubernetes)

Folks,
What problem now still persists:
I have now gone beyond the code getting stuck on CrashLoopBackOff by fixing the Dockerfile run command as suggested by Emil Gi, however the external IP is not forwarding to my pod library app server
Status
Fixed port to 8080 in Dockerfile and ensured it is consistent across
Made sure Dockerfile has proper commands so that it doesn't terminate immediately post startup, this was what was causing the CrashLoop Back
Problem is still that the load balancer external IP I click on gives this error "This site can’t be reached34.93.141.11 refused to connect."
Original Question:
How do I resolve this CrashLoopBackOff? I looked at many docs and tried debugging but unsure what is causing this? The app runs perfectly in local mode, it even deploys smoothly into appengine standard, but GKE nope. Any pointers to debug this further most appreciated.
Problem: The cloudsql proxy container is running, but the library-app container is having CrashLoopBackOff error. The pod was assigned to a node, starts pulling the images, starting the images, and then it goes into this BackOff state.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
library-7699b84747-9skst 1/2 CrashLoopBackOff 28 121m
$ kubectl logs library-7699b84747-9skst
Error from server (BadRequest): a container name must be specified for pod library-7699b84747-9skst, choose one of: [library-app cloudsql-proxy]
​$ kubectl describe pods library-7699b84747-9skst
Name: library-7699b84747-9skst
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: gke-library-default-pool-35b5943a-ps5v/10.160.0.13
Start Time: Fri, 06 Dec 2019 09:34:11 +0530
Labels: app=library
pod-template-hash=7699b84747
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container library-app; cpu request for container cloudsql-proxy
Status: Running
IP: 10.16.0.10
Controlled By: ReplicaSet/library-7699b84747
Containers:
library-app:
Container ID: docker://e7d8aac3dff318de34f750c3f1856cd754aa96a7203772de748b3e397441a609
Image: gcr.io/library-259506/library
Image ID: docker-pullable://gcr.io/library-259506/library#sha256:07f54e055621ab6ddcbb49666984501cf98c95133bcf7405ca076322fb0e4108
Port: 8080/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 06 Dec 2019 09:35:07 +0530
Finished: Fri, 06 Dec 2019 09:35:07 +0530
Ready: False
Restart Count: 2
Requests:
cpu: 100m
Environment:
DATABASE_USER: <set to the key 'username' in secret 'cloudsql'> Optional: false
DATABASE_PASSWORD: <set to the key 'password' in secret 'cloudsql'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
cloudsql-proxy:
Container ID: docker://352284231e7f02011dd1ab6999bf9a283b334590435278442e9a04d4d0684405
Image: gcr.io/cloudsql-docker/gce-proxy:1.16
Image ID: docker-pullable://gcr.io/cloudsql-docker/gce-proxy#sha256:7d302c849bebee8a3fc90a2705c02409c44c91c813991d6e8072f092769645cf
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
--dir=/cloudsql
-instances=library-259506:asia-south1:library=tcp:3306
-credential_file=/secrets/cloudsql/credentials.json
State: Running
Started: Fri, 06 Dec 2019 09:34:51 +0530
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/cloudsql from cloudsql (rw)
/etc/ssl/certs from ssl-certs (rw)
/secrets/cloudsql from cloudsql-oauth-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kj497 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
cloudsql-oauth-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-oauth-credentials
Optional: false
ssl-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
HostPathType:
cloudsql:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-kj497:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kj497
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 86s default-scheduler Successfully assigned default/library-7699b84747-9skst to gke-library-default-pool-35b5943a-ps5v
Normal Pulling 50s kubelet, gke-library-default-pool-35b5943a-ps5v pulling image "gcr.io/cloudsql-docker/gce-proxy:1.16"
Normal Pulled 47s kubelet, gke-library-default-pool-35b5943a-ps5v Successfully pulled image "gcr.io/cloudsql-docker/gce-proxy:1.16"
Normal Created 46s kubelet, gke-library-default-pool-35b5943a-ps5v Created container
Normal Started 46s kubelet, gke-library-default-pool-35b5943a-ps5v Started container
Normal Pulling 2s (x4 over 85s) kubelet, gke-library-default-pool-35b5943a-ps5v pulling image "gcr.io/library-259506/library"
Normal Created 1s (x4 over 50s) kubelet, gke-library-default-pool-35b5943a-ps5v Created container
Normal Started 1s (x4 over 50s) kubelet, gke-library-default-pool-35b5943a-ps5v Started container
Normal Pulled 1s (x4 over 52s) kubelet, gke-library-default-pool-35b5943a-ps5v Successfully pulled image "gcr.io/library-259506/library"
Warning BackOff 1s (x5 over 43s) kubelet, gke-library-default-pool-35b5943a-ps5v Back-off restarting failed container​
Here is the library.yaml file I have to go with it.
# [START kubernetes_deployment]
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: library
labels:
app: library
spec:
replicas: 2
template:
metadata:
labels:
app: library
spec:
containers:
- name: library-app
# Replace with your project ID or use `make template`
image: gcr.io/library-259506/library
# This setting makes nodes pull the docker image every time before
# starting the pod. This is useful when debugging, but should be turned
# off in production.
imagePullPolicy: Always
env:
# [START cloudsql_secrets]
- name: DATABASE_USER
valueFrom:
secretKeyRef:
name: cloudsql
key: username
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: cloudsql
key: password
# [END cloudsql_secrets]
ports:
- containerPort: 8080
# [START proxy_container]
- image: gcr.io/cloudsql-docker/gce-proxy:1.16
name: cloudsql-proxy
command: ["/cloud_sql_proxy", "--dir=/cloudsql",
"-instances=library-259506:asia-south1:library=tcp:3306",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: cloudsql-oauth-credentials
mountPath: /secrets/cloudsql
readOnly: true
- name: ssl-certs
mountPath: /etc/ssl/certs
- name: cloudsql
mountPath: /cloudsql
# [END proxy_container]
# [START volumes]
volumes:
- name: cloudsql-oauth-credentials
secret:
secretName: cloudsql-oauth-credentials
- name: ssl-certs
hostPath:
path: /etc/ssl/certs
- name: cloudsql
emptyDir:
# [END volumes]
# [END kubernetes_deployment]
---
# [START service]
# The library-svc service provides a load-balancing proxy over the polls app
# pods. By specifying the type as a 'LoadBalancer', Container Engine will
# create an external HTTP load balancer.
# The service directs traffic to the deployment by matching the service's selector to the deployment's label
#
# For more information about external HTTP load balancing see:
# https://cloud.google.com/container-engine/docs/load-balancer
apiVersion: v1
kind: Service
metadata:
name: library-svc
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
selector:
app: library
# [END service]
More error status
Container 'library-app' keeps crashing.
CrashLoopBackOff
Reason
Container 'library-app' keeps crashing.
Check Pod's logs to see more details. Learn more
Source
library-7699b84747-9skst
Conditions
Initialized: True Ready: False ContainersReady: False PodScheduled: True
- lastProbeTime: null
lastTransitionTime: "2019-12-06T06:03:43Z"
message: 'containers with unready status: [library-app]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
Key Events
Back-off restarting failed container BackOff Dec 6, 2019, 9:34:54
AM Dec 6, 2019, 12:24:26 PM 779 pulling image
"gcr.io/library-259506/library" Pulling Dec 6, 2019, 9:34:12 AM Dec 6,
2019, 11:59:26 AM 34
The Dockerfile is as follows (this fixed the CrashLoop btw):
FROM python:3
ENV PYTHONUNBUFFERED 1
RUN mkdir /code
WORKDIR /code
COPY requirements.txt /code/
RUN pip install -r requirements.txt
COPY . /code/
# Server
EXPOSE 8080
STOPSIGNAL SIGINT
ENTRYPOINT ["python", "manage.py"]
CMD ["runserver", "0.0.0.0:8080"]
I think a bunch of things all came together
I found the password to db had a special character that needed to be put within quotes and then ensuring port # where accurate across the Dockerfile, library.yaml files. This ensured the secrets actually worked, I detected in the logs a password mismatch issue.
IMPORTANT: the command line fix Emil G about ensuring my Dockerfile doesn't exit quickly, so make sure the CMD actually works and runs your server.
IMPORTANT: Finally I found a fix to the external IP not connecting to my server, see this thread where I explain what went wrong: basically I needed a security context where I had to fix the runAs to not run as root: RunAsUser issue & Clicking external IP of load balancer -> Bad Request (400) on deploying Django app on GKE (Kubernetes) and db connection failing:
I also documented all steps to deploy step 1-15 and

K8S Unable to mount AWS EBS as a persistent volume for pod

Question
Please suggest the cause of the error of not being able to mount AWS EBS volume in pod.
journalctl -b -f -u kubelet
1480 kubelet.go:1625] Unable to mount volumes for pod "nginx_default(ddc938ee-edda-11e7-ae06-06bb783bb15c)": timeout expired waiting for volumes to attach/mount for pod "default"/"nginx". list of unattached/unmounted volumes=[ebs]; skipping pod
1480 pod_workers.go:186] Error syncing pod ddc938ee-edda-11e7-ae06-06bb783bb15c ("nginx_default(ddc938ee-edda-11e7-ae06-06bb783bb15c)"), skipping: timeout expired waiting for volumes to attach/mount for pod "default"/"nginx". list of unattached/unmounted volumes=[ebs]
1480 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pv-ebs" (UniqueName: "kubernetes.io/aws-ebs/vol-0d275986ce24f4304") pod "nginx" (UID: "ddc938ee-edda-11e7-ae06-06bb783bb15c")
1480 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/aws-ebs/vol-0d275986ce24f4304\"" failed. No retries permitted until 2017-12-31 03:34:03.644604131 +0000 UTC m=+6842.543441523 (durationBeforeRetry 2m2s). Error: "Volume not attached according to node status for volume \"pv-ebs\" (UniqueName: \"kubernetes.io/aws-ebs/vol-0d275986ce24f4304\") pod \"nginx\" (UID: \"ddc938ee-edda-11e7-ae06-06bb783bb15c\") "
Steps
Deployed K8S 1.9 using kubeadm (without EBS volume mount, pods work) in AWS (us-west-1 and AZ is us-west-1b).
Configure an IAM role as per Kubernetes - Cloud Providers and kubelets failing to start when using 'aws' as cloud provider.
Assign the IAM role to EC2 instances as per Easily Replace or Attach an IAM Role to an Existing EC2 Instance by Using the EC2 Console.
Deploy PV/PVC/POD as in the manifest.
The status from the kubectl:
kubectl get
NAME READY STATUS RESTARTS AGE IP NODE
nginx 0/1 ContainerCreating 0 29m <none> ip-172-31-1-43.us-west-1.compute.internal
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv/pv-ebs 5Gi RWO Recycle Bound default/pvc-ebs 33m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc/pvc-ebs Bound pv-ebs 5Gi RWO 33m
kubectl describe pod nginx
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned nginx to ip-172-31-1-43.us-west-1.compute.internal
Normal SuccessfulMountVolume 27m kubelet, ip-172-31-1-43.us-west-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-dt698"
Warning FailedMount 6s (x12 over 25m) kubelet, ip-172-31-1-43.us-west-1.compute.internal Unable to mount volumes for pod "nginx_default(ddc938ee-edda-11e7-ae06-06bb783bb15c)": timeout expired waiting for volumes to attach/mount for pod "default"/"nginx". Warning FailedMount 6s (x12 over 25m) kubelet, ip-172-31-1-43.us-west-1.compute.internal Unable to mount volumes for pod "nginx_default(ddc938ee-edda-11e7-ae06-06bb783bb15c)": timeout expired waiting for volumes to attach/mount for pod "default"/"nginx".
Manifest
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv-ebs
labels:
type: amazonEBS
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
awsElasticBlockStore:
volumeID: vol-0d275986ce24f4304
fsType: ext4
persistentVolumeReclaimPolicy: Recycle
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-ebs
labels:
type: amazonEBS
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
kind: Pod
apiVersion: v1
metadata:
name: nginx
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: ebs
volumes:
- name: ebs
persistentVolumeClaim:
claimName: pvc-ebs
IAM Policy
Environment
$ kubectl version -o json
{
"clientVersion": {
"major": "1",
"minor": "9",
"gitVersion": "v1.9.0",
"gitCommit": "925c127ec6b946659ad0fd596fa959be43f0cc05",
"gitTreeState": "clean",
"buildDate": "2017-12-15T21:07:38Z",
"goVersion": "go1.9.2",
"compiler": "gc",
"platform": "linux/amd64"
},
"serverVersion": {
"major": "1",
"minor": "9",
"gitVersion": "v1.9.0",
"gitCommit": "925c127ec6b946659ad0fd596fa959be43f0cc05",
"gitTreeState": "clean",
"buildDate": "2017-12-15T20:55:30Z",
"goVersion": "go1.9.2",
"compiler": "gc",
"platform": "linux/amd64"
}
}
$ cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)
EC2
EBS
Solution
Found the documentation which shows how to configure AWS cloud provider.
K8S AWS Cloud Provider Notes
Steps
Tag EC2 instances and SG with the KubernetesCluster=${kubernetes cluster name}. If created with kubeadm, it is kubernetes as in Ability to configure user and cluster name in AdminKubeConfigFile
Run kubeadm init --config kubeadm.yaml.
kubeadm.yaml (Ansible template)
kind: MasterConfiguration
apiVersion: kubeadm.k8s.io/v1alpha1
api:
advertiseAddress: {{ K8S_ADVERTISE_ADDRESS }}
networking:
podSubnet: {{ K8S_SERVICE_ADDRESSES }}
cloudProvider: {{ K8S_CLOUD_PROVIDER }}
Result
$ journalctl -b -f CONTAINER_ID=$(docker ps | grep k8s_kube-controller-manager | awk '{ print $1 }')
Jan 02 04:48:28 ip-172-31-4-117.us-west-1.compute.internal dockerd-current[8063]: I0102 04:48:28.752141
1 reconciler.go:287] attacherDetacher.AttachVolume started for volume "kuard-pv" (UniqueName: "kubernetes.io/aws-ebs/vol-0d275986ce24f4304") from node "ip-172-3
Jan 02 04:48:39 ip-172-31-4-117.us-west-1.compute.internal dockerd-current[8063]: I0102 04:48:39.309178
1 operation_generator.go:308] AttachVolume.Attach succeeded for volume "kuard-pv" (UniqueName: "kubernetes.io/aws-ebs/vol-0d275986ce24f4304") from node "ip-172-
$ kubectl describe pod kuard
...
Volumes:
kuard-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kuard-pvc
ReadOnly: false
$ kubectl describe pv kuard-pv
Name: kuard-pv
Labels: failure-domain.beta.kubernetes.io/region=us-west-1
failure-domain.beta.kubernetes.io/zone=us-west-1b
type=amazonEBS
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"PersistentVolume","metadata":{"annotations":{},"labels":{"type":"amazonEBS"},"name":"kuard-pv","namespace":""},"spec":{"acce...
pv.kubernetes.io/bound-by-controller=yes
StorageClass:
Status: Bound
Claim: default/kuard-pvc
Reclaim Policy: Retain
Access Modes: RWO
Capacity: 5Gi
Message:
Source:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: vol-0d275986ce24f4304
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
$ kubectl version -o json
{
"clientVersion": {
"major": "1",
"minor": "9",
"gitVersion": "v1.9.0",
"gitCommit": "925c127ec6b946659ad0fd596fa959be43f0cc05",
"gitTreeState": "clean",
"buildDate": "2017-12-15T21:07:38Z",
"goVersion": "go1.9.2",
"compiler": "gc",
"platform": "linux/amd64"
},
"serverVersion": {
"major": "1",
"minor": "9",
"gitVersion": "v1.9.0",
"gitCommit": "925c127ec6b946659ad0fd596fa959be43f0cc05",
"gitTreeState": "clean",
"buildDate": "2017-12-15T20:55:30Z",
"goVersion": "go1.9.2",
"compiler": "gc",
"platform": "linux/amd64"
}
}