Failed to do failover in 3 node patroni postgres-11 cluster - postgresql-11

I have a three node patroni postgres-11 cluster running.The patroni cluster settings are:
consul:
url: <consul URL>
bootstrap:
initdb:
encoding: UTF8
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
remove_data_directory_on_diverged_timelines: false
remove_data_directory_on_rewind_failure: true
parameters:
archive_command: /bin/true
archive_mode: 'on'
archive_timeout: 1800s
checkpoint_completion_target: 0.9
default_statistics_target: 100
effective_cache_size: 24GB
effective_io_concurrency: 200
hot_standby: 'on'
log_directory: /var/log/postgresql
log_filename: postgresql-%a.log
logging_collector: 'on'
maintenance_work_mem: 2GB
max_connections: 2000
max_parallel_maintenance_workers: 4
max_parallel_workers: 8
max_parallel_workers_per_gather: 4
max_replication_slots: 10
max_wal_senders: 10
max_wal_size: 8GB
max_worker_processes: 8
min_wal_size: 2GB
random_page_cost: 1.1
shared_buffers: 8GB
ssl: 'on'
ssl_cert_file: /etc/ssl/certs/ssl-cert-snakeoil.pem
ssl_key_file: /etc/ssl/private/ssl-cert-snakeoil.key
track_commit_timestamp: 'on'
wal_buffers: 16MB
wal_keep_segments: 64
wal_level: logical
wal_log_hints: 'on'
work_mem: 5242kB
postgresql:
listen: 0.0.0.0:5432
connect_address: <IP Address of the node>:5432
data_dir: /var/lib/postgresql/11/<db name>
pgpass: /tmp/pgpass0
authentication:
superuser:
username: postgres
password: postgres
replication:
username: replicator
password: replicator
rewind:
username: rewind_user
password: rewind_password
pg_hba:
- local all all peer
- host all all 127.0.0.1/32 md5
- hostssl all postgres 0.0.0.0/0 md5
- host all postgres 0.0.0.0/0 reject
- host all all 0.0.0.0/0 md5
- hostssl all postgres ::0/0 md5
- host all postgres ::0/0 reject
- host all all ::0/0 md5
- local replication all peer
- host replication all 127.0.0.1/32 md5
- hostssl replication replicator 0.0.0.0/0 md5
- hostssl replication replicator ::0/0 md5
- host replication replicator 0.0.0.0/0 md5
- host replication replicator 127.0.0.1/32 trust
- local postgres postgres peer
- host all all 0.0.0.0/0 md5
create_replica_methods:
- basebackup
basebackup:
max-rate: '250M'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
restapi:
connect_address: <IP Address of the node>:8008
listen: 0.0.0.0:8008
When I shutdown the primary, I was hoping the synchronous standby would become primary. But instead on both the standby nodes I see the following error:
INFO: following a different leader because i am not the healthiest node
Not sure why synchronous standby did not become primary.
I am using Patroni 1.6.1 on Ubuntu 18.04

I figured out the problem. For some reason the capital letters in the names of my nodes were converted to small letters so patroni failed to find the standby nodes. Like below:
{"leader":"Pa-db1","sync_standby":"pa-db2"}
The correct name of the node is Pa-db2 and similarly the second standby node name is Pa-db3. I changed all the nodes names to small letters i.e. pa-db1, pa-db2, pa-db3 and failover worked!

Related

Redis deployed in AWS - Connection time out from localhost SpringBoot app

Small question regarding Redis deployed in AWS (not AWS Elastic Cache) and an issue connecting to it.
Here is the setup of the Redis deployed in AWS: (pasting only the Kubernetes StatefulSet and Service)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
spec:
serviceName: redis
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
initContainers:
- name: config
image: redis:7.0.5-alpine
command: [ "sh", "-c" ]
args:
- |
cp /tmp/redis/redis.conf /etc/redis/redis.conf
echo "finding master..."
MASTER_FDQN=`hostname -f | sed -e 's/redis-[0-9]\./redis-0./'`
if [ "$(redis-cli -h sentinel -p 5000 ping)" != "PONG" ]; then
echo "master not found, defaulting to redis-0"
if [ "$(hostname)" = "redis-0" ]; then
echo "this is redis-0, not updating config..."
else
echo "updating redis.conf..."
echo "slaveof $MASTER_FDQN 6379" >> /etc/redis/redis.conf
fi
else
echo "sentinel found, finding master"
MASTER="$(redis-cli -h sentinel -p 5000 sentinel get-master-addr-by-name mymaster | grep -E '(^redis-\d{1,})|([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})')"
echo "master found : $MASTER, updating redis.conf"
echo "slaveof $MASTER 6379" >> /etc/redis/redis.conf
fi
volumeMounts:
- name: redis-config
mountPath: /etc/redis/
- name: config
mountPath: /tmp/redis/
containers:
- name: redis
image: redis:7.0.5-alpine
command: ["redis-server"]
args: ["/etc/redis/redis.conf"]
ports:
- containerPort: 6379
name: redis
volumeMounts:
- name: data
mountPath: /data
- name: redis-config
mountPath: /etc/redis/
volumes:
- name: redis-config
emptyDir: {}
- name: config
configMap:
name: redis-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: nfs-1
resources:
requests:
storage: 50Mi
---
apiVersion: v1
kind: Service
metadata:
name: redis
spec:
ports:
- port: 6379
targetPort: 6379
name: redis
selector:
app: redis
type: LoadBalancer
The pods are healthy, I can exec into it and perform operations fine. Here is the get all:
NAME READY STATUS RESTARTS AGE
pod/redis-0 1/1 Running 0 22h
pod/redis-1 1/1 Running 0 22h
pod/redis-2 1/1 Running 0 22h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/redis LoadBalancer 192.168.45.55 10.51.5.2 6379:30315/TCP 26h
NAME READY AGE
statefulset.apps/redis 3/3 22h
Here is the describe of the service:
Name: redis
Namespace: Namespace
Labels: <none>
Annotations: <none>
Selector: app=redis
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 192.168.22.33
IPs: 192.168.22.33
LoadBalancer Ingress: 10.51.5.2
Port: redis 6379/TCP
TargetPort: 6379/TCP
NodePort: redis 30315/TCP
Endpoints: 192.xxx:6379,192.xxx:6379,192.xxx:6379
Session Affinity: None
External Traffic Policy: Cluster
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal IPAllocated 68s metallb-controller Assigned IP ["10.51.5.2"]
Normal nodeAssigned 58s (x5 over 66s) metallb-speaker announcing from node "someaddress.com" with protocol "bgp"
Normal nodeAssigned 58s (x5 over 66s) metallb-speaker announcing from node "someaddress.com" with protocol "bgp"
I then try to connect to it, i.e. inserting some data with a very straightforward Spring Boot application. The application has no business logic, just trying to insert data.
Here are the relevant parts:
#Configuration
public class RedisConfiguration {
#Bean
public ReactiveRedisConnectionFactory reactiveRedisConnectionFactory() {
return new LettuceConnectionFactory("10.51.5.2", 30315);
}
#Repository
public class RedisRepository {
private final ReactiveRedisOperations<String, String> reactiveRedisOperations;
public RedisRepository(ReactiveRedisOperations<String, String> reactiveRedisOperations) {
this.reactiveRedisOperations = reactiveRedisOperations;
}
public Mono<RedisPojo> save(RedisPojo redisPojo) {
return reactiveRedisOperations.opsForValue().set(redisPojo.getInput(), redisPojo.getOutput()).map(__ -> redisPojo);
}
Each time I am trying to write the data, I am getting this exception:
2022-12-02T20:20:08.015+08:00 ERROR 1184 --- [ctor-http-nio-3] a.w.r.e.AbstractErrorWebExceptionHandler : [8f16a752-1] 500 Server Error for HTTP POST "/save"
org.springframework.data.redis.RedisConnectionFailureException: Unable to connect to Redis
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.translateException(LettuceConnectionFactory.java:1602) ~[spring-data-redis-3.0.0.jar:3.0.0]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ Handler com.redis.controller.RedisController#test(RedisRequest) [DispatcherHandler]
*__checkpoint ⇢ HTTP POST "/save" [ExceptionHandlingWebHandler]
Original Stack Trace:
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$ExceptionTranslatingConnectionProvider.translateException(LettuceConnectionFactory.java:1602) ~[spring-data-redis-3.0.0.jar:3.0.0]
Caused by: io.lettuce.core.RedisConnectionException: Unable to connect to 10.51.5.2/<unresolved>:30315
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:78) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:56) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.AbstractRedisClient.getConnection(AbstractRedisClient.java:350) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
at io.lettuce.core.RedisClient.connect(RedisClient.java:216) ~[lettuce-core-6.2.1.RELEASE.jar:6.2.1.RELEASE]
Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /10.51.5.2:30315
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:261) ~[netty-transport-4.1.85.Final.jar:4.1.85.Final]
at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) ~[netty-common-4.1.85.Final.jar:4.1.85.Final]
This is particularly puzzling, because I am quite sure the code of the Spring Boot app is working. When I change the IP of return new LettuceConnectionFactory("10.51.5.2", 30315);: to
a regular Redis on my laptop ("localhost", 6379),
a dockerized Redis on my laptop,
a dockerized Redis on prem, all are working fine.
Therefore, I am quite puzzled what did I do wrong with the setup of this Redis in AWS.
What should I do in order to connect to it properly.
May I get some help please?
Thank you
By default, Redis binds itself to the IP addresses 127.0.0.1 and ::1 and does not accept connections against non-local interfaces. Chances are high that this is your main issue and you may want to review your redis.conf file to bind Redis to the interface you need or to the generic * -::*, as explained in the comments of the config file itself (which I have linked above).
With that being said, Redis also does not accept connections on non-local interfaces if the default user has no password - a security layer named Protected mode. Thus you should either give your default user a password or disable protected mode in your redis.conf file.
Not sure if this applies to your case but, as a side note, I would suggest to always avoid exposing Redis to the Internet.
You are mixing 2 things.
To enable this service for pods in different namespaces you do not need external load balancer, you can just try to use redis.namespace-name:6379 dns name and it will just work. Such dns is there for every service you create (but works only inside kubernetes)
Kubernetes will make sure that your traffic will be routed to proper pods (assuming there is more than one).
If you want to expose redis from outside of kubernetes then you need to make sure there is connectivity from the outside and then you need network load balancer that will forward traffic to your kubernetes service (in your case node port, so you need NLB with eks worker nodes: 30315 as a targets)
If your worker nodes have public IP and their SecurityGroups allow connecting to them directly, you could try to connect to worker node's IP directly just to test things out (without LB).
And regardless off yout setup you can always create proxy via kubectl
kubectl port-forward -n redisNS svc/redis 6379:6379
and connect from spring boot app to localhost:6379
How do you want to connect from app to redis in a final setup?

Promtail End Point Failure At /loki/api/v1/push On EC2 Instance Via Docker

I am using aws Instance and I am trying to run promtail in order to fetch logs and forward it to loki server. Promtail, Loki and Grafana are being run through Docker.
The Loki Server is runing on port 3100, Promtail on 3400 and Loki on 8001. Since It is an AWS platform what needs to be done so that It stops throwing error at http://43.206.43.87:3100/loki/api/v1/push end point.
here is my promtail-config.yaml
server:
http_listen_port: 3400
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://43.206.43.87:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- 43.206.43.87
labels:
job: varlogs
__path__: /var/log/*log
Here is my loki-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 0
common:
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
replication_factor: 1
ring:
instance_addr: 43.206.43.87
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
ruler:
alertmanager_url: http://localhost:9093
Please help me out
All I had to do was change the loki-config.yaml file in that
I had to write
instance_addr: localhost
and everything worked

How to connect from a public GKE pod to a GCP Cloud SQL using a private connection

I have a Java application running in a docker container. I am deploying all this to my GKE cluster. I'd like to have it connected to a CloudSQL instance via private IP. However I struggle for two days now to get it working. I followed the following guide:
https://cloud.google.com/sql/docs/mysql/configure-private-services-access#gcloud_1
I managed to create a private service connection and also gave my CloudSQL instance the address range. As far as I understood this should be sufficient for the Pod to be able to connect to the CloudSQL instance.
However it just does not work. I pass the private IP from CloudSQL as the host for the Java application JDBC (Database) connection.
│ 2022-02-14 22:03:31.299 WARN 1 --- [ main] o.h.e.j.e.i.JdbcEnvironmentInitiator : HHH000342: Could not obtain connection to query metadata │
│ │
│ com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
Here are some details to the problem.
The address allocation
➜ google-cloud-sdk gcloud compute addresses list
NAME ADDRESS/RANGE TYPE PURPOSE NETWORK REGION SUBNET STATUS
google-managed-services-default 10.77.0.0/16 INTERNAL VPC_PEERING default RESERVED
The vpc peering connection
➜ google-cloud-sdk gcloud services vpc-peerings list --network=default
---
network: projects/1071923183712/global/networks/default
peering: servicenetworking-googleapis-com
reservedPeeringRanges:
- google-managed-services-default
service: services/servicenetworking.googleapis.com
Here is my CloudSQL info. Please not that the PRIVATE IP Address is 10.77.0.5 and therefore matches the address range from above 10.77.0.0/16. I guess this part is working.
➜ google-cloud-sdk gcloud sql instances describe alpha-3
backendType: SECOND_GEN
connectionName: barbarus:europe-west4:alpha-3
createTime: '2022-02-14T19:28:02.465Z'
databaseInstalledVersion: MYSQL_5_7_36
databaseVersion: MYSQL_5_7
etag: 758de240b161b946689e5732d8e71d396c772c0e03904c46af3b61f59b1038a0
gceZone: europe-west4-a
instanceType: CLOUD_SQL_INSTANCE
ipAddresses:
- ipAddress: 34.90.174.243
type: PRIMARY
- ipAddress: 10.77.0.5
type: PRIVATE
kind: sql#instance
name: alpha-3
project: barbarus
region: europe-west4
selfLink: https://sqladmin.googleapis.com/sql/v1beta4/projects/barbarus/instances/alpha-3
serverCaCert:
cert: |-
-----BEGIN CERTIFICATE-----
//...
-----END CERTIFICATE-----
certSerialNumber: '0'
commonName: C=US,O=Google\, Inc,CN=Google Cloud SQL Server CA,dnQualifier=d495898b-f6c7-4e2f-9c59-c02ccf2c1395
createTime: '2022-02-14T19:29:35.325Z'
expirationTime: '2032-02-12T19:30:35.325Z'
instance: alpha-3
kind: sql#sslCert
sha1Fingerprint: 3ee799b139bf335ef39554b07a5027c9319087cb
serviceAccountEmailAddress: p1071923183712-d99fsz#gcp-sa-cloud-sql.iam.gserviceaccount.com
settings:
activationPolicy: ALWAYS
availabilityType: ZONAL
backupConfiguration:
backupRetentionSettings:
retainedBackups: 7
retentionUnit: COUNT
binaryLogEnabled: true
enabled: true
kind: sql#backupConfiguration
location: us
startTime: 12:00
transactionLogRetentionDays: 7
dataDiskSizeGb: '10'
dataDiskType: PD_HDD
ipConfiguration:
allocatedIpRange: google-managed-services-default
ipv4Enabled: true
privateNetwork: projects/barbarus/global/networks/default
requireSsl: false
kind: sql#settings
locationPreference:
kind: sql#locationPreference
zone: europe-west4-a
pricingPlan: PER_USE
replicationType: SYNCHRONOUS
settingsVersion: '1'
storageAutoResize: true
storageAutoResizeLimit: '0'
tier: db-f1-micro
state: RUNNABLE
The problem I see is with the Pod's IP Address. It is 10.0.5.3 and that is not in the range of 10.77.0.0/16 and therefore the pod can't see the CloudSQL instance.
See here is the Pod's info:
Name: game-server-5b9dd47cbd-vt2gw
Namespace: default
Priority: 0
Node: gke-barbarus-node-pool-1a5ea7d5-bg3m/10.164.15.216
Start Time: Tue, 15 Feb 2022 00:33:56 +0100
Labels: app=game-server
app.kubernetes.io/managed-by=gcp-cloud-build-deploy
pod-template-hash=5b9dd47cbd
Annotations: <none>
Status: Running
IP: 10.0.5.3
IPs:
IP: 10.0.5.3
Controlled By: ReplicaSet/game-server-5b9dd47cbd
Containers:
game-server:
Container ID: containerd://57d9540b1e5f5cb3fcc4517fa42377282943d292ba810c83cd7eb50bd4f1e3dd
Image: eu.gcr.io/barbarus/game-server#sha256:72d518a53652d32d0d438d2a5443c44cc8e12bb15cb1a59c843ce72466900141
Image ID: eu.gcr.io/barbarus/game-server#sha256:72d518a53652d32d0d438d2a5443c44cc8e12bb15cb1a59c843ce72466900141
Port: <none>
Host Port: <none>
State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 15 Feb 2022 00:36:48 +0100
Finished: Tue, 15 Feb 2022 00:38:01 +0100
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 15 Feb 2022 00:35:23 +0100
Finished: Tue, 15 Feb 2022 00:36:35 +0100
Ready: False
Restart Count: 2
Environment:
SQL_CONNECTION: <set to the key 'SQL_CONNECTION' of config map 'game-server'> Optional: false
SQL_USER: <set to the key 'SQL_USER' of config map 'game-server'> Optional: false
SQL_DATABASE: <set to the key 'SQL_DATABASE' of config map 'game-server'> Optional: false
SQL_PASSWORD: <set to the key 'SQL_PASSWORD' of config map 'game-server'> Optional: false
LOG_LEVEL: <set to the key 'LOG_LEVEL' of config map 'game-server'> Optional: false
WORLD_ID: <set to the key 'WORLD_ID' of config map 'game-server'> Optional: false
WORLD_SIZE: <set to the key 'WORLD_SIZE' of config map 'game-server'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sknlk (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-sknlk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m7s default-scheduler Successfully assigned default/game-server-5b9dd47cbd-vt2gw to gke-barbarus-node-pool-1a5ea7d5-bg3m
Normal Pulling 4m6s kubelet Pulling image "eu.gcr.io/barbarus/game-server#sha256:72d518a53652d32d0d438d2a5443c44cc8e12bb15cb1a59c843ce72466900141"
Normal Pulled 3m55s kubelet Successfully pulled image "eu.gcr.io/barbarus/game-server#sha256:72d518a53652d32d0d438d2a5443c44cc8e12bb15cb1a59c843ce72466900141" in 11.09487284s
Normal Created 75s (x3 over 3m54s) kubelet Created container game-server
Normal Started 75s (x3 over 3m54s) kubelet Started container game-server
Normal Pulled 75s (x2 over 2m41s) kubelet Container image "eu.gcr.io/barbarus/game-server#sha256:72d518a53652d32d0d438d2a5443c44cc8e12bb15cb1a59c843ce72466900141" already present on machine
Warning BackOff 1s (x2 over 87s) kubelet Back-off restarting failed container
Finally this is what gcloud container clusters describe gives me:
➜ google-cloud-sdk gcloud container clusters describe --region=europe-west4 barbarus
addonsConfig:
gcePersistentDiskCsiDriverConfig:
enabled: true
kubernetesDashboard:
disabled: true
networkPolicyConfig:
disabled: true
autopilot: {}
autoscaling:
autoscalingProfile: BALANCED
binaryAuthorization: {}
clusterIpv4Cidr: 10.0.0.0/14
createTime: '2022-02-14T19:34:03+00:00'
currentMasterVersion: 1.21.6-gke.1500
currentNodeCount: 3
currentNodeVersion: 1.21.6-gke.1500
databaseEncryption:
state: DECRYPTED
endpoint: 34.141.141.150
id: 39e7249b48c24d23a8b70b0c11cd18901565336b397147dab4778dc75dfc34e2
initialClusterVersion: 1.21.6-gke.1500
initialNodeCount: 1
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-a/instanceGroupManagers/gke-barbarus-node-pool-e291e3d6-grp
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-b/instanceGroupManagers/gke-barbarus-node-pool-5aa35c39-grp
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-c/instanceGroupManagers/gke-barbarus-node-pool-380645b7-grp
ipAllocationPolicy:
useRoutes: true
labelFingerprint: a9dc16a7
legacyAbac: {}
location: europe-west4
locations:
- europe-west4-a
- europe-west4-b
- europe-west4-c
loggingConfig:
componentConfig:
enableComponents:
- SYSTEM_COMPONENTS
- WORKLOADS
loggingService: logging.googleapis.com/kubernetes
maintenancePolicy:
resourceVersion: e3b0c442
masterAuth:
clusterCaCertificate: // ...
masterAuthorizedNetworksConfig: {}
monitoringConfig:
componentConfig:
enableComponents:
- SYSTEM_COMPONENTS
monitoringService: monitoring.googleapis.com/kubernetes
name: barbarus
network: default
networkConfig:
defaultSnatStatus: {}
network: projects/barbarus/global/networks/default
serviceExternalIpsConfig: {}
subnetwork: projects/barbarus/regions/europe-west4/subnetworks/default
nodeConfig:
diskSizeGb: 100
diskType: pd-standard
imageType: COS_CONTAINERD
machineType: e2-medium
metadata:
disable-legacy-endpoints: 'true'
oauthScopes:
- https://www.googleapis.com/auth/cloud-platform
preemptible: true
serviceAccount: default#barbarus.iam.gserviceaccount.com
shieldedInstanceConfig:
enableIntegrityMonitoring: true
nodeIpv4CidrSize: 24
nodePoolDefaults:
nodeConfigDefaults: {}
nodePools:
- config:
diskSizeGb: 100
diskType: pd-standard
imageType: COS_CONTAINERD
machineType: e2-medium
metadata:
disable-legacy-endpoints: 'true'
oauthScopes:
- https://www.googleapis.com/auth/cloud-platform
preemptible: true
serviceAccount: default#barbarus.iam.gserviceaccount.com
shieldedInstanceConfig:
enableIntegrityMonitoring: true
initialNodeCount: 1
instanceGroupUrls:
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-a/instanceGroupManagers/gke-barbarus-node-pool-e291e3d6-grp
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-b/instanceGroupManagers/gke-barbarus-node-pool-5aa35c39-grp
- https://www.googleapis.com/compute/v1/projects/barbarus/zones/europe-west4-c/instanceGroupManagers/gke-barbarus-node-pool-380645b7-grp
locations:
- europe-west4-a
- europe-west4-b
- europe-west4-c
management:
autoRepair: true
autoUpgrade: true
name: node-pool
podIpv4CidrSize: 24
selfLink: https://container.googleapis.com/v1/projects/barbarus/locations/europe-west4/clusters/barbarus/nodePools/node-pool
status: RUNNING
upgradeSettings:
maxSurge: 1
version: 1.21.6-gke.1500
notificationConfig:
pubsub: {}
releaseChannel:
channel: REGULAR
selfLink: https://container.googleapis.com/v1/projects/barbarus/locations/europe-west4/clusters/barbarus
servicesIpv4Cidr: 10.3.240.0/20
shieldedNodes:
enabled: true
status: RUNNING
subnetwork: default
zone: europe-west4
I have no idea how I can give the pod a reference to the address allocation I made for the private service connection.
What I tried is to spin up a GKE cluster with a Cluster default pod address range of 10.77.0.0/16 which sounded logical since I want the pods to appear in the same address range as the CloudSQL. However GCP gives me an error when I try to do that:
(1) insufficient regional quota to satisfy request: resource "CPUS": request requires '9.0' and is short '1.0'. project has a quota of '8.0' with '8.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=hait-barbarus (2) insufficient regional quota to satisfy request: resource "IN_USE_ADDRESSES": request requires '9.0' and is short '5.0'. project has a quota of '4.0' with '4.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=hait-barbarus.
So I am not able to give the pods the proper address range for the private service connection how can they ever discover the CloudSQL instance?
EDIT #1: The GKE cluster's service account has the SQL Client role.

How to correctly use dynamic inventories with Ansible?

I am trying to provide initial configuration and software installation to a newly created AWS EC2 instance by using Ansible. If I run my playbooks independently it works just as I want. However, if I try to automate it into a single playbook by using two imports, it doesn't work (probably because the dynamic inventory can't get the newly created IP address?)...
Running together:
[WARNING]: Could not match supplied host pattern, ignoring:
aws_region_eu_central_1
PLAY [variables from dynamic inventory] ****************************************
skipping: no hosts matched
Running separately:
TASK [Gathering Facts] *********************************************************
[WARNING]: Platform linux on host XX.XX.XX.XX is using the discovered Python
interpreter at /usr/bin/python, but future installation of another Python
interpreter could change the meaning of that path. See https://docs.ansible.com
/ansible/2.10/reference_appendices/interpreter_discovery.html for more
information.
ok: [XX.XX.XX.XX]
This is my main playbook:
- import_playbook: server-setup.yml
- import_playbook: server-configuration.yml
server-setup.yml:
---
# variables from dynamic inventory
- name: variables from dynamic inventory
remote_user: ec2-user
hosts: localhost
roles:
- ec2-instance
server-configuration.yml:
---
# variables from dynamic inventory
- name: variables from dynamic inventory
remote_user: ec2-user
become: true
become_method: sudo
become_user: root
ignore_unreachable: true
hosts: aws_region_eu_central_1
gather_facts: false
pre_tasks:
- pause:
minutes: 5
roles:
- { role: epel, sudo: true }
- { role: nodejs, sudo: true }
This is my ansible.cfg file:
[defaults]
inventory = test_aws_ec2.yaml
private_key_file = master-key.pem
enable_plugins = aws_ec2
host_key_checking = False
pipelining = True
log_path = ansible.log
roles_path = /roles
forks = 1000
and finally my hosts.ini:
[local]
localhost ansible_python_interpreter=/usr/local/bin/python3

Kitchen-EC2 SSH prompting password for an instance inside VPC

I am trying to spin up an ec2 instance inside a VPC on a private subnet. Every time I run kitchen test, I am able to spin up the instance with the right security groups and in the right subnet range. When test-kitchen is trying to SSH on to the instance, it is asking for password. However, when I manually try to ssh (ssh <private_ip> -i <path_to_ssh_key> -l ubuntu) on to the machine I succeed without being prompted for a password.
The following is my .kitchen.yml file
---
driver:
name: ec2
aws_ssh_key_id: id-spanning
security_group_ids: ['sg-9....5']
region: us-east-1
availability_zone: us-east-1a
require_chef_omnibus: true
subnet_id: subnet-5...0
associate_public_ip: false
instance_type: m3.medium
interface: private
transport:
ssh_key: ~/.ssh/id-spanning.pem
connection_timeout: 10
connection_retries: 5
username: ubuntu
provisioner:
name: chef_solo
platforms:
- name: Ubuntu-14.04
driver:
image_id: ami-8821cae0
suites:
- name: default
run_list:
attributes:
I have the aws credentials in place on the environment variables. The following is my output.
kitchen test
-----> Starting Kitchen (v1.4.0)
-----> Cleaning up any prior instances of <default-Ubuntu-1404>
-----> Destroying <default-Ubuntu-1404>...
EC2 instance <i-16f468c6> destroyed.
Finished destroying <default-Ubuntu-1404> (0m1.90s).
-----> Testing <default-Ubuntu-1404>
-----> Creating <default-Ubuntu-1404>...
Creating <>...
If you are not using an account that qualifies under the AWS
free-tier, you may be charged to run these suites. The charge
should be minimal, but neither Test Kitchen nor its maintainers
are responsible for your incurred costs.
Instance <i-8fad345f> requested.
EC2 instance <i-8fad345f> created.
Waited 0/300s for instance <i-8fad345f> to become ready.
Waited 5/300s for instance <i-8fad345f> to become ready.
Waited 10/300s for instance <i-8fad345f> to become ready.
Waited 15/300s for instance <i-8fad345f> to become ready.
Waited 20/300s for instance <i-8fad345f> to become ready.
Waited 25/300s for instance <i-8fad345f> to become ready.
EC2 instance <i-8fad345f> ready.
Password:
I tried several times and haven't had any luck on bypassing the password to allow test-kitchen to ssh on to the instance. The following is my kitchen diagnose output.
---
timestamp: 2015-05-26 15:34:29 UTC
kitchen_version: 1.4.0
instances:
default-Ubuntu-1404:
platform:
os_type: unix
shell_type: bourne
state_file:
hostname: ''
server_id: i-1.....6
driver:
associate_public_ip: false
availability_zone: us-east-1a
aws_access_key_id:
aws_secret_access_key:
aws_session_token:
aws_ssh_key_id: id-spanning
block_device_mappings:
ebs_optimized: false
flavor_id:
iam_profile_name:
image_id: ami-8821cae0
instance_type: m3.medium
interface: private
kitchen_root: "/Users/jonnas2/Desktop/apache101"
log_level: :info
name: ec2
price:
private_ip_address:
region: us-east-1
retryable_sleep: 5
retryable_tries: 60
security_group_ids:
- sg-9....5
shared_credentials_profile:
subnet_id: subnet-5....0
tags:
created-by: test-kitchen
test_base_path: "/Users/jonnas2/Desktop/apache101/test/integration"
user_data:
username:
provisioner:
attributes: {}
chef_metadata_url:
chef_omnibus_install_options:
chef_omnibus_root: "/opt/chef"
chef_omnibus_url: https://www.chef.io/chef/install.sh
chef_solo_path: "/opt/chef/bin/chef-solo"
clients_path:
cookbook_files_glob: README.*,metadata {json,rb},attributes/**/*,definitions/**/*,files/**/*,libraries/**/*,providers/**/*,recipes/**/*,resources/**/*,templates/**/*
data_bags_path:
data_path:
encrypted_data_bag_secret_key_path:
environments_path:
http_proxy:
https_proxy:
kitchen_root: "/Users/jonnas2/Desktop/apache101"
log_file:
log_level: :info
name: chef_solo
nodes_path:
require_chef_omnibus: true
roles_path:
root_path: "/tmp/kitchen"
run_list: []
solo_rb: {}
sudo: true
sudo_command: sudo -E
test_base_path: "/Users/jonnas2/Desktop/apache101/test/integration"
transport:
compression: zlib
compression_level: 6
connection_retries: 5
connection_retry_sleep: 1
connection_timeout: 10
keepalive: true
keepalive_interval: 60
kitchen_root: "/Users/jonnas2/Desktop/apache101"
log_level: :info
max_wait_until_ready: 600
name: ssh
port: 22
ssh_key: "/Users/jonnas2/.ssh/id-spanning.pem"
test_base_path: "/Users/jonnas2/Desktop/apache101/test/integration"
username: ubuntu
verifier:
busser_bin: "/tmp/verifier/bin/busser"
http_proxy:
https_proxy:
kitchen_root: "/Users/jonnas2/Desktop/apache101"
log_level: :info
name: busser
root_path: "/tmp/verifier"
ruby_bindir: "/opt/chef/embedded/bin"
sudo: true
sudo_command: sudo -E
suite_name: default
test_base_path: "/Users/jonnas2/Desktop/apache101/test/integration"
version: busser
versions used:
test-kitchen 1.4.0
kitchen-ec2 0.9.0
Any help would be greatly appreciated. Thanks.
This issue was resolved by test-kitchen 1.4.1. A fix was merged (https://github.com/test-kitchen/test-kitchen/pull/704]) into core test-kitchen which disables password auth if an ssh_key is configured.