Terraform Google provider, create log-based alerting policy - google-cloud-platform

I need to create a log-based alerting policy via Terraform Google cloud provider :
https://cloud.google.com/logging/docs/alerting/monitoring-logs#lba
I checked from the Terraform official documentation and i saw 'google_monitoring_alert_policy' resource : https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy
I don't found with this doc how creating a log based alerting policy.
I can create an alerting policy with type 'Metrics' but not with type 'Logs'
I use the latest version of Terraform Google cloud provider : https://registry.terraform.io/providers/hashicorp/google/latest
How can i create a log-based alerting policy with Terraform Google provider please ?
Thanks in advance for your help.

Problem is solved with version 4.7.0 of google provider, which adds condition_matched_log. Here is a working example :
resource "google_monitoring_notification_channel" "email-me" {
display_name = "Email Me"
type = "email"
labels = {
email_address = "me#mycompany.com"
}
}
resource "google_monitoring_alert_policy" "workflows" {
display_name = "Workflows alert policy"
combiner = "OR"
conditions {
display_name = "Error condition"
condition_matched_log {
filter = "resource.type=\"workflows.googleapis.com/Workflow\" severity=ERROR"
}
}
notification_channels = [ google_monitoring_notification_channel.email-me.name ]
alert_strategy {
notification_rate_limit {
period = "300s"
}
}
}

Thanks Guillaume.
Yes it's the way i solved the issue.
Now there is no way to directly create alerting with log type, via Terraform.
The steps to solve this problem :
Create un log based metric with expected filter
Create an alerting policy with type metric based on the previous created log based metric
resource "google_logging_metric" "my_log_metrics" {
project = var.project_id
name = "my-log-metric"
filter = "..."
description = "..."
metric_descriptor {
metric_kind = "..."
value_type = "..."
}
}
resource "google_monitoring_alert_policy" "my_policy" {
project = var.project_id
display_name = "my-policy"
combiner = "OR"
conditions {
display_name = "my-policy"
condition_threshold {
filter = "metric.type=\"logging.googleapis.com/user/my-log-metric\" AND resource.type=\"cloud_composer_environment\""
...
}
}

The format is logging.googleapis.com/user/<user metrics name>
Look at this example (no notification, only the alert policy)
resource "google_monitoring_alert_policy" "alert_policy" {
display_name = "My Alert Policy"
combiner = "OR"
conditions {
display_name = "test condition"
condition_threshold {
filter = "metric.type=\"logging.googleapis.com/user/test-metrics\" AND resource.type=\"cloud_run_revision\""
duration = "600s"
comparison = "COMPARISON_GT"
threshold_value = 1
}
}
user_labels = {
foo = "bar"
}
}

Related

Add environment based Multiple Notification Channel to GCP Alert Policy with Terraform Lookup

I'm trying to add multiple notification channels to a GCP Alert policy with terraform.
My issue is that I need to add different notification channels based on the production environment where they are deployed.
As long as I keep the notification channel unique, I can easily deploy in the following way.
Here is my variables.tf file:
locals {
notification_channel = {
DEV = "projects/[PROJECT_ID]/notificationChannels/[CHANNEL_ID]"
PRD = "projects/[PROJECT_ID]/notificationChannels/[CHANNEL_ID]"
}
}
Here is my main.tf file:
resource "google_monitoring_alert_policy" "alert_policy" {
display_name = "My Alert Policy"
combiner = "OR"
conditions {
display_name = "test condition"
condition_threshold {
filter = "metric.type=\"compute.googleapis.com/instance/disk/write_bytes_count\" AND resource.type=\"gce_instance\""
duration = "60s"
comparison = "COMPARISON_GT"
aggregations {
alignment_period = "60s"
per_series_aligner = "ALIGN_RATE"
}
}
}
user_labels = {
foo = "bar"
}
notification_channels = [lookup(local.notification_channel, terraform.workspace)]
}
My issue here happens when I try to map multiple notification channels instead of one per environment.
Something like:
locals {
notification_channel = {
DEV = ["projects/[PROJECT_ID]/notificationChannels/[CHANNEL_ID]", "projects/[PROJECT_ID]/notificationChannels/[CHANNEL_ID]" ]...
}
}
However, if I try this way, system tells me that Inappropriate value for attribute "notification_channels": element 0: string.
Here's documentation of:
Terraform Lookup function Terraform
GCP Alert Policy
Could you help?
If I understood your question, you actually need only to remove the square brackets:
notification_channels = lookup(local.notification_channel, terraform.workspace)
Since the local variable notification_channel is already a list, you only need to use lookup to fetch the value based on the workspace you are currently in.

GCP Alerting Policy to Alert on KMS Key Deletion Using Terraform

I am trying to alert on KMS Key deletions using terraform.
I have a log based metric, a policy and a notification channel to PagerDuty.
This all works, however, following the alert triggering it soon clears and there seems to be nothing I can do to stop this.
Here is my code:
resource "google_logging_metric" "logging_metric" {
name = "kms-key-pending-deletion"
description = "Logging metric used to alert on scheduled deletions of KMS keys"
filter = "resource.type=cloudkms_cryptokeyversion AND protoPayload.methodName=DestroyCryptoKeyVersion"
metric_descriptor {
metric_kind = "DELTA"
value_type = "INT64"
unit = "1"
display_name = "kms-key-pending-deletion-metric-descriptor"
}
}
resource "google_monitoring_notification_channel" "pagerduty_alerts" {
display_name = "pagerduty-notification-channel"
type = "pagerduty"
sensitive_labels {
service_key = var.token
}
}
resource "google_monitoring_alert_policy" "kms_key_deletion_alert_policy" {
display_name = "kms-key-deletion-alert-policy"
combiner = "OR"
notification_channels = [google_monitoring_notification_channel.pagerduty_alerts.name]
conditions {
display_name = "kms-key-deletion-alert-policy-conditions"
condition_threshold {
comparison = "COMPARISON_GT"
duration = "300s"
filter = "metric.type=\"logging.googleapis.com/user/kms-key-pending-deletion\" AND resource.type=\"global\""
threshold_value = "0"
}
}
documentation {
content = "Runbook: https://blah"
}
}
In the GCP GUI I can disable the option "Notify on incident closure" in the policy and it stops the alert from clearing.
However I cannot set this via terraform.
I have tried setting alert_strategy.auto_close to null and 0s but this did not work:
alert_strategy {
auto_close = "0s"
# auto_close = null
}
How do I keep the alert active and stop it from clearing when building the policy in terraform?
Am I using the correct resource type? - Should I be using cloudkms.cryptoKey.state that are in "DESTROY_SCHEDULED" state somehow?
For others wanting to find the answer to this:
The need to keep an alert open and not allow it to automatically close is missing in the API.
The issue is tracked here: https://issuetracker.google.com/issues/151052441?pli=1

error creating SageMaker project in terraform because service catalog product "does not exist or access was denied"

I have a product with id: prod-xxxxxxxxxxxx. I have checked that it exists in aws service catalog. However, when I try to create an aws_sagemaker_project using terraform:
resource "aws_sagemaker_project" "test-project" {
project_name = "test-project"
service_catalog_provisioning_details {
product_id = "prod-xxxxxxxxxxxx"
}
}
I get the error: "error creating SageMaker project: ValidationException: Product prod-xxxxxxxxxxxx does not exist or access was denied". How do I ensure that I can access this product?
Do I need a launch constraint for this product, and to grant access to the portfolio to end users as described here: https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates-custom.html?
You need to aws_servicecatalog_product terraform state to refer to product_id.
resource "aws_servicecatalog_product" "example" {
name = "example"
owner = [aws_security_group.example.id]
type = aws_subnet.main.id
provisioning_artifact_parameters {
template_url = "https://s3.amazonaws.com/cf-templates-ozkq9d3hgiq2-us-east-1/temp1.json"
}
tags = {
foo = "bar"
}
}
resource "aws_sagemaker_project" "example" {
project_name = "example"
service_catalog_provisioning_details {
product_id = aws_servicecatalog_product.example.id
}
}
This error means that you haven't granted access to the service catalog's portfolio to your terraform IAM principal/user/role. Basically you are unable to "see" the product based on the end user section of the service catalog portfolio.
You can fix this by adding the following Servicecatalog resources
resource "aws_servicecatalog_principal_portfolio_association" "project" {
portfolio_id = aws_servicecatalog_portfolio.portfolio.id
principal_arn = "${ROLE_ARN}"
}
resource "aws_servicecatalog_portfolio" "portfolio" {
name = "My App Portfolio"
description = "List of my organizations apps"
provider_name = "Brett"
}
resource "aws_servicecatalog_product_portfolio_association" "example" {
portfolio_id = aws_servicecatalog_portfolio.portfolio.id
product_id = "prod-xxxxxxxxxxxx"
}

Deploy multiple Cloudrun service with same dockerimage

There are 25+ Cloudrun services that use the same docker image(from GCR) but are configured with different variables. What is an easy and reliable method to deploy all the services with the latest container image from any kind of incoming events?
Currently using below CLI command to execute one by one manually. Is there any automated way to implement auto deployment for all the service one after another or in parallel.
gcloud run deploy SERVICE --image IMAGE_URL
Addn: Labels are been used to mark the 25 containers which have the same container images. Not required to build docker image everytime from source. The same image can be used.
In case Terraform is an option for you, you can automate all Cloud Run services deployment using either with the count or for_each meta-arguments:
count if you need the same service name with indexes
provider "google" {
project = "MY-PROJECT-ID"
}
resource "google_cloud_run_service" "default" {
count = 25
name = "MY-SERVICE-${count.index}"
location = "MY-REGION"
metadata {
annotations = {
"run.googleapis.com/client-name" = "terraform"
}
}
template {
spec {
containers {
image = "IMAGE_URL"
}
}
}
}
data "google_iam_policy" "noauth" {
binding {
role = "roles/run.invoker"
members = ["allUsers"]
}
}
resource "google_cloud_run_service_iam_policy" "noauth" {
for_each = google_cloud_run_service.default
location = each.value.location
project = each.value.project
service = each.value.name
policy_data = data.google_iam_policy.noauth.policy_data
}
where MY-PROJECT-ID and MY-REGION needs to be replaced with your project specific values.
for_each if you need different service names
provider "google" {
project = "MY-PROJECT-ID"
}
resource "google_cloud_run_service" "default" {
for_each = toset( ["Service 1", "Service 2", "Service 25"] )
name = each.key
location = "MY-REGION"
metadata {
annotations = {
"run.googleapis.com/client-name" = "terraform"
}
}
template {
spec {
containers {
image = "IMAGE_URL"
}
}
}
}
data "google_iam_policy" "noauth" {
binding {
role = "roles/run.invoker"
members = ["allUsers"]
}
}
resource "google_cloud_run_service_iam_policy" "noauth" {
for_each = google_cloud_run_service.default
location = each.value.location
project = each.value.project
service = each.value.name
policy_data = data.google_iam_policy.noauth.policy_data
}
where MY-PROJECT-ID and MY-REGION needs to be replaced with your project specific values as well.
You can refer to the official GCP Cloud Run documentation for further details on Terraform usage.

How to create an alert policy for unknown custom metric in GCP

Given the following alert policy in GCP (created with terraform)
resource "google_monitoring_alert_policy" "latency_alert_policy" {
display_name = "Latency of 95th percentile more than 1 second"
combiner = "OR"
conditions {
display_name = "Latency of 95th percentile more than 1 second"
condition_threshold {
filter = "metric.type=\"custom.googleapis.com/http/server/requests/p95\" resource.type=\"k8s_pod\""
threshold_value = 1000
duration = "60s"
comparison = "COMPARISON_GT"
aggregations {
alignment_period = "60s"
per_series_aligner= "ALIGN_NEXT_OLDER"
cross_series_reducer= "REDUCE_MAX"
group_by_fields = [
"metric.label.\"uri\"",
"metric.label.\"method\"",
"metric.label.\"status\"",
"metadata.user_labels.\"app.kubernetes.io/name\"",
"metadata.user_labels.\"app.kubernetes.io/component\""
]
}
trigger {
count = 1
percent = 0
}
}
}
}
I get the following this error (which is part of a terraform project also creating the cluster):
Error creating AlertPolicy: googleapi: Error 404: The metric referenced by the provided filter is unknown. Check the metric name and labels.
Now, this is a custom metric (by a Spring Boot app with Micrometer), therefore this metric does not exist when creating infrastructure. Does GCP have to know a metric before creating an alert for it? This would mean that a Spring boot app has to be deployed on a cluster and sending metrics before this policy can be created?
Am I missing something... (like this should not be done in terraform, infrastructure)?
interesting question, the reason for the 404 error is because the resource was not found, there seems to be a preexisting pre-requisite for the descriptor. I would create the metric descriptor first, you can use this as reference, then going forward on creating the alerting policy.
This is an ingenious way you may avoid it. Please comment if it makes sense and if you make it work like this, share it.
For reference (this can be referenced from the alert policy according to terraform doc):
resource "google_monitoring_metric_descriptor" "p95_latency" {
description = ""
display_name = ""
type = "custom.googleapis.com/http/server/requests/p95"
metric_kind = "GAUGE"
value_type = "DOUBLE"
labels {
key = "status"
}
labels {
key = "uri"
}
labels {
key = "exception"
}
labels {
key = "method"
}
labels {
key = "outcome"
}
}