Google Deployment Manager - database creation fails - google-cloud-platform

I'm trying to create a CloudSQL instance and two databases using Google Deployment Manager. I can't get a reliable first-time deployment where both databases create successfully. Instead, each time I run it, one (or both!) fail with the status "FAILED_PRECONDITION", error message "Bad Request" and no further explanation as to which precondition has failed or how to fix it. Has anyone else come across this before or have any clues on how I can find the issue?
The properties {{ SQL_NAME }} etc. are all defined at the top of my jinja template, but I've ommitted them for clarity.
resources:
- name: {{ SQL_NAME }}
type: sqladmin.v1beta4.instance
properties:
backendType: SECOND_GEN
instanceType: CLOUD_SQL_INSTANCE
region: {{ properties["region"] }}
databaseVersion: {{ properties["dbType"] }}
settings:
tier: db-n1-standard-1
dataDiskSizeGb: 10
dataDiskType: PD_SSD
storageAutoResize: true
replicationType: SYNCHRONOUS
locationPreference:
zone: {{ properties['zone']}}
ipConfiguration:
privateNetwork: {{ properties["network"] }}
- name: {{ DB_NAME }}
type: sqladmin.v1beta4.database
properties:
name: db1
instance: $(ref.{{ SQL_NAME }}.name)
charset: utf8
collation: utf8_general_ci
metadata:
dependsOn:
- {{ SQL_NAME }}
- name: {{ DB2_NAME }}
type: sqladmin.v1beta4.database
properties:
name: db2
instance: $(ref.{{ SQL_NAME }}.name)
charset: utf8
metadata:
dependsOn:
- {{ SQL_NAME }}
- name: {{ USER_NAME }}
type: sqladmin.v1beta4.user
properties:
name: dbroot
host: "%"
instance: $(ref.{{ SQL_NAME }}.name)
password: {{ properties['password'] }}
metadata:
dependsOn:
- {{ SQL_NAME }}

So, I found the answer. It turns out Google is even less helpful with their error message than I thought when I came across the issue. What it seems to be (I still have no concrete evidence that this was the precondition, but the below seems to solve it) is that you can't create two databases at the same time on the same CloudSQL instance, and Deployment Manager tries to because they're both only dependant on the CloudSQL instance itself. I solved the issue by adding dependencies on each successive resource:
- name: {{ DB_NAME }}
type: sqladmin.v1beta4.database
properties:
name: db1
instance: $(ref.{{ SQL_NAME }}.name)
charset: utf8
collation: utf8_general_ci
metadata:
dependsOn:
- {{ SQL_NAME }}
- name: {{ DB2_NAME }}
type: sqladmin.v1beta4.database
properties:
name: db2
instance: $(ref.{{ SQL_NAME }}.name)
charset: utf8
metadata:
dependsOn:
- {{ SQL_NAME }}
- {{ DB_NAME }}
- name: {{ USER_NAME }}
type: sqladmin.v1beta4.user
properties:
name: dbroot
host: "%"
instance: $(ref.{{ SQL_NAME }}.name)
password: {{ properties['password'] }}
metadata:
dependsOn:
- {{ SQL_NAME }}
- {{ DB2_NAME }}

BTW, my coworker just discovered today that as of right meow, explicitly setting backendType to SECOND_GEN and databaseVersion to MYSQL_5_6 also yielded a 400 error. You can use this combination in the console. This is a very recent API break. Just a heads up.

Related

Rabbit cluster in k8s slower than EC2

Right now we are working with 5 EC2 instances (AWS) of rabbitmq and we try to migrate to k8s cluster.
We deployed with EKS a cluster that works fine until 45K users.
The 5 separate instances can handle 75K users.
We discovered that the latency was higher in k8s cluster than the connection with EC2 instances.
We used this tool: https://www.rabbitmq.com/rabbitmq-diagnostics.8.html and we didn't find a problem. The file descriptors looks fine, the memory, CPU and etc...
we deployed with https://github.com/rabbitmq/cluster-operator
values.yaml
serviceName: rabbitmq
namespace: berlin
regionCode: use1
env: dev
resourcesConfig:
replicas: 9
nodeGroupName: r-large
storageType: gp2
storageSize: 100Gi
resources:
limits:
cpu: 8
memory: 60Gi
requests:
cpu: 7
memory: 60Gi
definitionsConf:
vhosts:
- name: /
exchanges:
- name: test
vhost: /
type: direct
durable: true
auto_delete: false
internal: false
arguments: {}
policies:
- vhost: /
name: Test Policy
pattern: test.*.*.*
apply-to: queues
definition:
federation-upstream-set: all
priority: 0
additionalPlugins:
- rabbitmq_event_exchange
- rabbitmq_auth_backend_cache
- rabbitmq_auth_backend_http
- rabbitmq_prometheus
- rabbitmq_shovel
rabbitmqConf:
load_definitions: /etc/rabbitmq/definitions.json
# definitions.skip_if_unchanged: 'true'
cluster_partition_handling: pause_minority
auth_backends.1: cache
auth_cache.cached_backend: http
auth_cache.cache_ttl: '10000'
auth_http.http_method: post
auth_http.user_path: http://XXXX:3000/authentication/users
auth_http.vhost_path: http://XXX:3000/authentication/vhosts
auth_http.resource_path: http://XXX:3000/authentication/resources
auth_http.topic_path: http://XXX:3000/authentication/topics
prometheus.path: /metrics
prometheus.tcp.port: '15692'
log.console: 'true'
log.console.level: error
log.console.formatter: json
log.default.level: error
tcp_listen_options.backlog: '4096'
tcp_listen_options.nodelay: 'true'
tcp_listen_options.sndbuf: '32768'
tcp_listen_options.recbuf: '32768'
tcp_listen_options.keepalive: 'true'
tcp_listen_options.linger.on: 'true'
tcp_listen_options.linger.timeout: '0'
disk_free_limit.relative: '1.0'
num_acceptors.tcp: '40'
hipe_compile: 'true'
collect_statistics_interval: '30000'
mnesia_table_loading_retry_timeout: '60000'
heartbeat: '60'
vm_memory_high_watermark.relative: '0.9'
management_agent.disable_metrics_collector: 'true'
management.disable_stats: 'true'
metricsConfig:
metricsPath: /metrics
metricsPort: '15692'
Chart.yaml
apiVersion: v2
name: rabbitmq
description: RabbitMQ Cluster
type: application
version: 0.0.1
charts/templates/configmap.yaml
{{- $varNamespace := .Values.namespace }}
apiVersion: v1
kind: ConfigMap
metadata:
namespace: {{ .Values.namespace }}
name: {{ .Values.serviceName }}-definitions-conf
data:
definitions.json: |
{{ .Values.definitionsConf | toJson |replace "NAMESPACE" $varNamespace }}
{{- $varNamespace := .Values.namespace}}
{{- $varRegionCode := .Values.regionCode}}
{{- $varEnv := .Values.env}}
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: {{ .Values.serviceName }}
namespace: {{ .Values.namespace }}
spec:
replicas: {{ .Values.resourcesConfig.replicas }}
rabbitmq:
envConfig: |
ERL_MAX_PORTS=10000000
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+S 4:4 +P 2000000"
advancedConfig: |
[
{kernel, [
{inet_default_connect_options, [{nodelay, true}]},
{inet_default_listen_options, [{nodelay, true}]}
]}
].
additionalPlugins: {{ .Values.additionalPlugins | toJson | indent 4 }}
additionalConfig: |
{{- range $key, $val := .Values.rabbitmqConf }}
{{ $key }} = {{ $val | replace "NAMESPACE" $varNamespace | replace "REGION_CODE" $varRegionCode | replace "ENV" $varEnv }}
{{- end }}
resources:
requests:
cpu: {{ .Values.resourcesConfig.resources.requests.cpu }}
memory: {{ .Values.resourcesConfig.resources.requests.memory }}
limits:
cpu: {{ .Values.resourcesConfig.resources.limits.cpu }}
memory: {{ .Values.resourcesConfig.resources.limits.memory }}
persistence:
storageClassName: {{ .Values.resourcesConfig.storageType }}
storage: {{ .Values.resourcesConfig.storageSize }}
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- {{ .Values.serviceName }}
topologyKey: kubernetes.io/hostname
service:
type: LoadBalancer
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
service.beta.kubernetes.io/aws-load-balancer-name: {{ .Values.serviceName }}
service.beta.kubernetes.io/load-balancer-source-ranges: {{ .Values.service.allowedVpcCidrRange }}
service.beta.kubernetes.io/aws-load-balancer-internal: 'true'
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags: Name={{ .Values.serviceName }}
external-dns.alpha.kubernetes.io/hostname: {{ .Values.serviceName }}.{{ .Values.service.hostedZone }}
override:
statefulSet:
spec:
template:
metadata:
annotations:
platform.vonage.com/logging: enabled
telegraf.influxdata.com/class: influxdb
telegraf.influxdata.com/inputs: |+
[[inputs.prometheus]]
urls = ["http://127.0.0.1:{{ .Values.metricsConfig.metricsPort }}{{ .Values.metricsConfig.metricsPath }}"]
metric_version = 1
tagexclude = ["url"]
telegraf.influxdata.com/env-literal-NAMESPACE: {{ $.Values.namespace }}
telegraf.influxdata.com/env-literal-SERVICENAME: {{ $.Values.serviceName }}
spec:
nodeSelector:
node-group-label: {{ .Values.resourcesConfig.nodeGroupName }}
containers:
- name: rabbitmq
volumeMounts:
- name: definitions
mountPath: {{ .Values.rabbitmqConf.load_definitions }}
subPath: definitions.json
volumes:
- name: definitions
configMap:
name: {{ .Values.serviceName }}-definitions-conf
Can someone gives us an advice what we can check or how can we solve our issue?
Thanks.
I'm trying to replace the rabbitmq instances to rabbitmq k8s cluster. We want the same results (or better) than the separate instances.

Unable to assign service account to Instance

I am using the below jijna instance creation template in GCP. When I create this I get the below error.
I have already tried giving even Owner and iam.serviceAccountUser permission to xxxxxxxxxx#cloudservices.gserviceaccount.com account.
I have also added xxxxxxxxxx#cloudservices.gserviceaccount.com user with iam.serviceAccountUser role under Access control section as member in my new service account I have created which is test-vm-gke-init-vm#.iam.gserviceaccount.com.
ERROR: (gcloud.deployment-manager.deployments.update) Error in Operation [operation-1663746707027-5e92b3777f488-7748cc3b-a3d3cc8f]: errors:
- code: RESOURCE_ERROR
location: /deployments/demo-vm/resources/test-vm
message: "{\"ResourceType\":\"compute.v1.instance\",\"ResourceErrorCode\":\"SERVICE_ACCOUNT_ACCESS_DENIED\"\
,\"ResourceErrorMessage\":\"The user does not have access to service account 'serviceAccount:test-vm-gke-init-vm#<myproject>.iam.gserviceaccount.com'.\
\ User: 'xxxxxxxxxx#cloudservices.gserviceaccount.com'. Ask a project owner\
\ to grant you the iam.serviceAccountUser role on the service account\"}"
resources:
- type: compute.v1.instance
name: {{ properties.name }}
properties:
machineType: https://www.googleapis.com/compute/v1/projects/{{ env["project"] }}/zones/{{ properties["zone"] }}/machineTypes/{{ properties["machineType"] }}
zone: {{ properties["zone"] }}
disks:
- deviceName: boot
type: PERSISTENT
boot: true
autoDelete: true
initializeParams:
diskName: disk-{{ env["deployment"] }}
sourceImage: https://www.googleapis.com/compute/v1/projects/debian-cloud/global/images/debian-11-bullseye-v20220822
networkInterfaces:
- network: https://www.googleapis.com/compute/v1/projects/{{ env["project"] }}/global/networks/default
accessConfigs:
- name: External NAT
type: ONE_TO_ONE_NAT
serviceAccounts:
- email: serviceAccount:$(ref.{{ properties.name }}-sa.email)
scopes:
- https://www.googleapis.com/auth/devstorage.read_only
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring.write
- https://www.googleapis.com/auth/servicecontrol
- https://www.googleapis.com/auth/service.management.readonly
- https://www.googleapis.com/auth/compute
- https://www.googleapis.com/auth/cloud-platform
- name: {{ properties.name }}-sa
type: gcp-types/iam-v1:projects.serviceAccounts
properties:
accountId: {{ properties.name }}-gke-init-vm
displayName: {{ properties.name }}-gke-init-vm
accessControl:
gcpIamPolicy:
bindings:
- role: roles/iam.serviceAccountUser
members:
- "serviceAccount:<myproject>#cloudservices.gserviceaccount.com"
- name: {{ env["project"] }}-{{ properties.name }}-initnode-sa-binding
type: gcp-types/cloudresourcemanager-v1:virtual.projects.iamMemberBinding
properties:
resource: {{ env["project"] }}
member: serviceAccount:$(ref.{{ properties.name }}-sa.email)
role: roles/container.clusterAdmin
- name: {{ env["project"] }}-{{ properties.name }}-initnode-sa-binding-v2
type: gcp-types/cloudresourcemanager-v1:virtual.projects.iamMemberBinding
properties:
resource: {{ env["project"] }}
member: serviceAccount:$(ref.{{ properties.name }}-sa.email)
role: roles/container.admin
For reference for other communities for this issue
specify the reference below to resolve the issue.
$(ref.{{ properties.name }}-sa.email)

GCP GKE Ingress Health Checks

I have a deployment and service running in GKE using Deployment Manager. Everything about my service works correctly except that the ingress I am creating reports the service in a perpetually unhealthy state.
To be clear, everything about the deployment works except the healthcheck (and as a consequence, the ingress). This was working previously (circa late 2019), and apparently about a year ago GKE added some additional requirements for healthchecks on ingress target services and I have been unable to make sense of them.
I have put an explicit health check on the service, and it reports healthy, but the ingress does not recognize it. The service is using a NodePort but also has containerPort 80 open on the deployment, and it does respond with HTTP 200 to requests on :80 locally, but clearly that is not helping in the deployed service.
The cluster itself is an almost nearly identical copy of the Deployment Manager example
Here is the deployment:
- name: {{ DEPLOYMENT }}
type: {{ CLUSTER_TYPE }}:{{ DEPLOYMENT_COLLECTION }}
metadata:
dependsOn:
- {{ properties['clusterType'] }}
properties:
apiVersion: apps/v1
kind: Deployment
namespace: {{ properties['namespace'] | default('default') }}
metadata:
name: {{ DEPLOYMENT }}
labels:
app: {{ APP }}
tier: resters
spec:
replicas: 1
selector:
matchLabels:
app: {{ APP }}
tier: resters
template:
metadata:
labels:
app: {{ APP }}
tier: resters
spec:
containers:
- name: rester
image: {{ IMAGE }}
resources:
requests:
cpu: 100m
memory: 250Mi
ports:
- containerPort: 80
env:
- name: GCP_PROJECT
value: {{ PROJECT }}
- name: SERVICE_NAME
value: {{ APP }}
- name: MODE
value: rest
- name: REDIS_ADDR
value: {{ properties['memorystoreAddr'] }}
... the service:
- name: {{ SERVICE }}
type: {{ CLUSTER_TYPE }}:{{ SERVICE_COLLECTION }}
metadata:
dependsOn:
- {{ properties['clusterType'] }}
- {{ APP }}-cluster-nodeport-firewall-rule
- {{ DEPLOYMENT }}
properties:
apiVersion: v1
kind: Service
namespace: {{ properties['namespace'] | default('default') }}
metadata:
name: {{ SERVICE }}
labels:
app: {{ APP }}
tier: resters
spec:
type: NodePort
ports:
- nodePort: {{ NODE_PORT }}
port: {{ CONTAINER_PORT }}
targetPort: {{ CONTAINER_PORT }}
protocol: TCP
selector:
app: {{ APP }}
tier: resters
... the explicit healthcheck:
- name: {{ SERVICE }}-healthcheck
type: compute.v1.healthCheck
metadata:
dependsOn:
- {{ SERVICE }}
properties:
name: {{ SERVICE }}-healthcheck
type: HTTP
httpHealthCheck:
port: {{ NODE_PORT }}
requestPath: /healthz
proxyHeader: NONE
checkIntervalSec: 10
healthyThreshold: 2
unhealthyThreshold: 3
timeoutSec: 5
... the firewall rules:
- name: {{ CLUSTER_NAME }}-nodeport-firewall-rule
type: compute.v1.firewall
properties:
name: {{ CLUSTER_NAME }}-nodeport-firewall-rule
network: projects/{{ PROJECT }}/global/networks/default
sourceRanges:
- 130.211.0.0/22
- 35.191.0.0/16
targetTags:
- {{ CLUSTER_NAME }}-node
allowed:
- IPProtocol: TCP
ports:
- 30000-32767
- 80
You could try to define a readinessProbe on your container in your Deployment.
This is also a metric that the ingress uses to create health checks (note that these health checks probes come from outside of GKE)
And In my experience, these readiness probes work pretty well to get the ingress health checks to work,
To do this, you create something like this, this is a TCP Probe, I have seen better performance with TCP probes.
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 10
periodSeconds: 10
So this probe will check port: 80, which is the one I see is used by the pod in this service, and this will also help configure the ingress health check for a better result.
Here is some helpful documentation on how to create the TCP readiness probes which the ingress health check can be based on.

Ansible 2.5.2, AWS Lambda, Want to create a template that works whether or not I have subnets or security groups assigned

I'm using Ansible 2.5.2 to try and automate deployment of Lambda into AWS.
How do I create template so that if the security groups section is blank, the code deploys?
Below results in the error EC2 Error Message: The subnet ID '' does not exist
---
- name: Deploy and Update Lambda
hosts: localhost
gather_facts: no
connection: local
tasks:
- name: Lambda Deploy
lambda:
profile: "{{ profile }}"
name: '{{ item.name }}'
state: present #absent or present
zip_file: '{{ item.zip_file }}'
runtime: 'python2.7'
role: '{{ item.role }}'
handler: 'hello_python.my_handler'
vpc_subnet_ids: '{{ item.vpc_subnet_ids }}'
vpc_security_group_ids: '{{ item.vpc_security_group_ids }}'
environment_variables: '{{ item.env_vars }}'
tags: "{{ item.tags }}"
with_items:
- name: AnsibleTest
role: 'arn:aws:iam::xxxxxxxxxx:role/Dev-LambdaRole'
zip_file: hello-code.zip
vpc_subnet_ids:
# - subnet-080802e6660be744c
# - subnet-00a8380a28ae0528c
# - subnet-0723ad3c29a435ee0
vpc_security_group_ids:
- sg-0fa788da8ecd36fe5
env_vars:
key1: "first"
key2: "second"
tags:
x: "133"
xx: "1"
project-name: "x"
xxx: "Ansible"
app-function: "automation"
Name: "AnsibleTest"

Create empty disk for Google Deployment Manager

I am try to creat a compute instance using Googles deployment manager. It has shall get two disk, one based on a boot image and a second that shall be blank. The blank disk will later on be formated and mounted correctly by salt stack. Deployment manager complains with "Source image must be specified." How do I create a second blank disk for a compute instance using deployment manager?
My compute-instance.jinja:
resources:
- type: compute.v1.instance
name: {{ env["deployment"] }}-{{ env["name"] }}
properties:
zone: europe-west1-c
machineType: zones/europe-west1-c/machineTypes/n1-standard-1
disks:
- deviceName: {{ env["deployment"] }}-{{ env["name"] }}
type: PERSISTENT
boot: true
autoDelete: true
initializeParams:
sourceImage: global/images/XXXXXXX
- deviceName: {{ env["deployment"] }}-{{ env["name"] }}-data
type: PERSISTENT
boot: false
autoDelete: true
initializeParams:
diskSizeGb: {{ properties["size"] }}
networkInterfaces:
- network: global/networks/default
accessConfigs:
- name: External NAT
type: ONE_TO_ONE_NAT
Solved by creating a separate resource for the disk using:
- type: compute.v1.disk
name: {{ env["deployment"] }}-{{ env["name"] }}-1-data
properties:
sizeGb: {{ properties["size"] }}
zone: europe-west1-c
Then refer to it from the compute-instance:
- deviceName: {{ env["deployment"] }}-{{ env["name"] }}-1-data
boot: false
autoDelete: true
source: $(ref.{{ env["deployment"] }}-{{ env["name"] }}-1-data.selfLink)
If you must supply the source image, you can create an image of an empty disk.
The downside of this, is that you will start paying (not much tough) for 10gb of image storage when your actually storing nothing.
gcloud compute disks create emptydisk --size 10GB
gcloud compute images create empty-disk-image --source-disk emptydisk
And then use it as the source image
- deviceName: {{ env["deployment"] }}-{{ env["name"] }}-data
type: PERSISTENT
boot: false
autoDelete: true
initializeParams:
sourceImage: projects/your-project-id/global/images/empty-disk-image
diskSizeGb: {{ properties["size"] }}