Add environment based Multiple Notification Channel to GCP Alert Policy with Terraform Lookup - google-cloud-platform

I'm trying to add multiple notification channels to a GCP Alert policy with terraform.
My issue is that I need to add different notification channels based on the production environment where they are deployed.
As long as I keep the notification channel unique, I can easily deploy in the following way.
Here is my variables.tf file:
locals {
notification_channel = {
DEV = "projects/[PROJECT_ID]/notificationChannels/[CHANNEL_ID]"
PRD = "projects/[PROJECT_ID]/notificationChannels/[CHANNEL_ID]"
}
}
Here is my main.tf file:
resource "google_monitoring_alert_policy" "alert_policy" {
display_name = "My Alert Policy"
combiner = "OR"
conditions {
display_name = "test condition"
condition_threshold {
filter = "metric.type=\"compute.googleapis.com/instance/disk/write_bytes_count\" AND resource.type=\"gce_instance\""
duration = "60s"
comparison = "COMPARISON_GT"
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_RATE"
}
}
}
user_labels = {
foo = "bar"
}
notification_channels = [lookup(local.notification_channel, terraform.workspace)]
}
My issue here happens when I try to map multiple notification channels instead of one per environment.
Something like:
locals {
notification_channel = {
DEV = ["projects/[PROJECT_ID]/notificationChannels/[CHANNEL_ID]", "projects/[PROJECT_ID]/notificationChannels/[CHANNEL_ID]" ]...
}
}
However, if I try this way, system tells me that Inappropriate value for attribute "notification_channels": element 0: string.
Here's documentation of:
Terraform Lookup function Terraform
GCP Alert Policy
Could you help?

If I understood your question, you actually need only to remove the square brackets:
notification_channels = lookup(local.notification_channel, terraform.workspace)
Since the local variable notification_channel is already a list, you only need to use lookup to fetch the value based on the workspace you are currently in.

Related

GCP Alerting Policy to Alert on KMS Key Deletion Using Terraform

I am trying to alert on KMS Key deletions using terraform.
I have a log based metric, a policy and a notification channel to PagerDuty.
This all works, however, following the alert triggering it soon clears and there seems to be nothing I can do to stop this.
Here is my code:
resource "google_logging_metric" "logging_metric" {
name = "kms-key-pending-deletion"
description = "Logging metric used to alert on scheduled deletions of KMS keys"
filter = "resource.type=cloudkms_cryptokeyversion AND protoPayload.methodName=DestroyCryptoKeyVersion"
metric_descriptor {
metric_kind = "DELTA"
value_type = "INT64"
unit = "1"
display_name = "kms-key-pending-deletion-metric-descriptor"
}
}
resource "google_monitoring_notification_channel" "pagerduty_alerts" {
display_name = "pagerduty-notification-channel"
type = "pagerduty"
sensitive_labels {
service_key = var.token
}
}
resource "google_monitoring_alert_policy" "kms_key_deletion_alert_policy" {
display_name = "kms-key-deletion-alert-policy"
combiner = "OR"
notification_channels = [google_monitoring_notification_channel.pagerduty_alerts.name]
conditions {
display_name = "kms-key-deletion-alert-policy-conditions"
condition_threshold {
comparison = "COMPARISON_GT"
duration = "300s"
filter = "metric.type=\"logging.googleapis.com/user/kms-key-pending-deletion\" AND resource.type=\"global\""
threshold_value = "0"
}
}
documentation {
content = "Runbook: https://blah"
}
}
In the GCP GUI I can disable the option "Notify on incident closure" in the policy and it stops the alert from clearing.
However I cannot set this via terraform.
I have tried setting alert_strategy.auto_close to null and 0s but this did not work:
alert_strategy {
auto_close = "0s"
# auto_close = null
}
How do I keep the alert active and stop it from clearing when building the policy in terraform?
Am I using the correct resource type? - Should I be using cloudkms.cryptoKey.state that are in "DESTROY_SCHEDULED" state somehow?
For others wanting to find the answer to this:
The need to keep an alert open and not allow it to automatically close is missing in the API.
The issue is tracked here: https://issuetracker.google.com/issues/151052441?pli=1

Terraform Google provider, create log-based alerting policy

I need to create a log-based alerting policy via Terraform Google cloud provider :
https://cloud.google.com/logging/docs/alerting/monitoring-logs#lba
I checked from the Terraform official documentation and i saw 'google_monitoring_alert_policy' resource : https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy
I don't found with this doc how creating a log based alerting policy.
I can create an alerting policy with type 'Metrics' but not with type 'Logs'
I use the latest version of Terraform Google cloud provider : https://registry.terraform.io/providers/hashicorp/google/latest
How can i create a log-based alerting policy with Terraform Google provider please ?
Thanks in advance for your help.
Problem is solved with version 4.7.0 of google provider, which adds condition_matched_log. Here is a working example :
resource "google_monitoring_notification_channel" "email-me" {
display_name = "Email Me"
type = "email"
labels = {
email_address = "me#mycompany.com"
}
}
resource "google_monitoring_alert_policy" "workflows" {
display_name = "Workflows alert policy"
combiner = "OR"
conditions {
display_name = "Error condition"
condition_matched_log {
filter = "resource.type=\"workflows.googleapis.com/Workflow\" severity=ERROR"
}
}
notification_channels = [ google_monitoring_notification_channel.email-me.name ]
alert_strategy {
notification_rate_limit {
period = "300s"
}
}
}
Thanks Guillaume.
Yes it's the way i solved the issue.
Now there is no way to directly create alerting with log type, via Terraform.
The steps to solve this problem :
Create un log based metric with expected filter
Create an alerting policy with type metric based on the previous created log based metric
resource "google_logging_metric" "my_log_metrics" {
project = var.project_id
name = "my-log-metric"
filter = "..."
description = "..."
metric_descriptor {
metric_kind = "..."
value_type = "..."
}
}
resource "google_monitoring_alert_policy" "my_policy" {
project = var.project_id
display_name = "my-policy"
combiner = "OR"
conditions {
display_name = "my-policy"
condition_threshold {
filter = "metric.type=\"logging.googleapis.com/user/my-log-metric\" AND resource.type=\"cloud_composer_environment\""
...
}
}
The format is logging.googleapis.com/user/<user metrics name>
Look at this example (no notification, only the alert policy)
resource "google_monitoring_alert_policy" "alert_policy" {
display_name = "My Alert Policy"
combiner = "OR"
conditions {
display_name = "test condition"
condition_threshold {
filter = "metric.type=\"logging.googleapis.com/user/test-metrics\" AND resource.type=\"cloud_run_revision\""
duration = "600s"
comparison = "COMPARISON_GT"
threshold_value = 1
}
}
user_labels = {
foo = "bar"
}
}

How to create a slack notification channel in Google Cloud Platform with terraform

I'm trying to create a slack notification channel in GCP with terraform. I am able to create a channel with the code below, but it's missing "Team" and "Owner" attributes.
resource "google_monitoring_notification_channel" "default" {
display_name = "Test Slack Channel"
type = "slack"
enabled = "true"
labels = {
"channel_name" = "#testing"
"auth_token" = "<my_slack_app_token>"
}
}
The first channel in the screenshot below was created via GUI and works fine. The second channel was created via terraform and is unable to send notificaitons:
Terraform registry does not mention these attributes, I have tried defining them in labels right after channel_name:
labels = {
"channel_name" = "#testing"
"team" = "<my_team>"
"owner" = "google_cloud_monitoring"
"auth_token" = "<my_slack_app_token>"
}
I got the following error:
Error creating NotificationChannel: googleapi: Error 400: Field "notification_channel.labels['owner']" is not allowed; labels must conform to the channel type's descriptor; permissible label keys for "slack" are: {"auth_token", "channel_name"}
Apparently, only channel_name and auth_token are valid labels.
What am I missing?
Slack needs the sensitive_lables option for tokens. There is an example in the docs
resource "google_monitoring_notification_channel" "default" {
display_name = "Test Slack Channel"
type = "slack"
labels = {
"channel_name" = "#foobar"
}
sensitive_labels {
auth_token = "...."
}
}

How to create an alert policy for unknown custom metric in GCP

Given the following alert policy in GCP (created with terraform)
resource "google_monitoring_alert_policy" "latency_alert_policy" {
display_name = "Latency of 95th percentile more than 1 second"
combiner = "OR"
conditions {
display_name = "Latency of 95th percentile more than 1 second"
condition_threshold {
filter = "metric.type=\"custom.googleapis.com/http/server/requests/p95\" resource.type=\"k8s_pod\""
threshold_value = 1000
duration = "60s"
comparison = "COMPARISON_GT"
aggregations {
alignment_period = "60s"
per_series_aligner= "ALIGN_NEXT_OLDER"
cross_series_reducer= "REDUCE_MAX"
group_by_fields = [
"metric.label.\"uri\"",
"metric.label.\"method\"",
"metric.label.\"status\"",
"metadata.user_labels.\"app.kubernetes.io/name\"",
"metadata.user_labels.\"app.kubernetes.io/component\""
]
}
trigger {
count = 1
percent = 0
}
}
}
}
I get the following this error (which is part of a terraform project also creating the cluster):
Error creating AlertPolicy: googleapi: Error 404: The metric referenced by the provided filter is unknown. Check the metric name and labels.
Now, this is a custom metric (by a Spring Boot app with Micrometer), therefore this metric does not exist when creating infrastructure. Does GCP have to know a metric before creating an alert for it? This would mean that a Spring boot app has to be deployed on a cluster and sending metrics before this policy can be created?
Am I missing something... (like this should not be done in terraform, infrastructure)?
interesting question, the reason for the 404 error is because the resource was not found, there seems to be a preexisting pre-requisite for the descriptor. I would create the metric descriptor first, you can use this as reference, then going forward on creating the alerting policy.
This is an ingenious way you may avoid it. Please comment if it makes sense and if you make it work like this, share it.
For reference (this can be referenced from the alert policy according to terraform doc):
resource "google_monitoring_metric_descriptor" "p95_latency" {
description = ""
display_name = ""
type = "custom.googleapis.com/http/server/requests/p95"
metric_kind = "GAUGE"
value_type = "DOUBLE"
labels {
key = "status"
}
labels {
key = "uri"
}
labels {
key = "exception"
}
labels {
key = "method"
}
labels {
key = "outcome"
}
}

Using Count in Terraform to create Launch Configuration

I have 3 different version of an AMI, for 3 different nodes in a cluster.
data "aws_ami" "node1"
{
# Use the most recent AMI that matches the pattern below in 'values'.
most_recent = true
filter {
name = "name"
values = ["AMI_node1*"]
}
filter {
name = "tag:version"
values = ["${var.node1_version}"]
}
}
data "aws_ami" "node2"
{
# Use the most recent AMI that matches the pattern below in 'values'.
most_recent = true
filter {
name = "name"
values = ["AMI_node2*"]
}
filter {
name = "tag:version"
values = ["${var.node2_version}"]
}
}
data "aws_ami" "node3"
{
...
}
I would like to create 3 different Launch Configuration and Auto Scaling Group using each of the AMIs respectively.
resource "aws_launch_configuration" "node"
{
count = "${local.node_instance_count}"
# Name-prefix must be used otherwise terraform fails to perform updates to existing launch configurations due to
# a name conflict: LCs are immutable and the LC cannot be destroyed without destroying attached ASGs as well, which
# terraform will not do. Using name-prefix lets a new LC be created and swapped into the ASG.
name_prefix = "${var.environment_name}-node${count.index + 1}-"
image_id = "${data.aws_ami.node[count.index].image_id}"
instance_type = "${var.default_ec2_instance_type}"
...
}
However, I am not able use aws_ami.node1, aws_ami.node2, aws_ami.node3 using the cound.index the way I have shown above. I get the following error:
Error reading config for aws_launch_configuration[node]: parse error at 1:39: expected "}" but found "."
Is there another way I can do this in Terraform?
Indexing data sources isn't something that's doable; at the moment.
You're likely better off simply dropping the data sources you've defined and codifying the image IDs into a Terraform map variable.
variable "node_image_ids" {
type = "map"
default = {
"node1" = "1234434"
"node2" = "1233334"
"node3" = "1222434"
}
}
Then, consume it:
image_id = "${lookup(var.node_image_ids, concat("node", count.index), "some_default_image_id")}"
The downside of this is that you'll need to manually update the image id when images are upgraded.