GCP terraform - alerts module based on log metrics - google-cloud-platform

As per subject, I have set up log based metrics for a platform in gcp i.e. firewall, audit, route etc. monitoring.
enter image description here
Now I need to setup alert policies tied to these log based metrics, which is easy enough to do manually in gcp.
enter image description here
However, I need to do it via terraform thus using this module:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy#nested_alert_strategy
I might be missing something very simple but finding it hard to understand this as the alert strategy is apparently required but yet does not seem to be supported?
I am also a bit confused on which kind of condition I should be using to match my already setup log based metric?
This is my module so far, PS. I have tried using the same filter as I did for setting up the log based metric as well as the name of the log based filter:
resource "google_monitoring_alert_policy" "alert_policy" {
display_name = var.display_name
combiner = "OR"
conditions {
display_name = var.display_name
condition_matched_log {
filter = var.filter
#duration = "600s"
#comparison = "COMPARISON_GT"
#threshold_value = 1
}
}
user_labels = {
foo = "bar"
}
}
var filter is:
resource.type="gce_route" AND (protoPayload.methodName:"compute.routes.delete" OR protoPayload.methodName:"compute.routes.insert")

Got this resolved in the end.
Turns out common issue:
https://issuetracker.google.com/issues/143436657?pli=1
Had to add this to the filter parameter in my terraform module after the metric name - AND resource.type="global"

Related

Tags in Datadog for autoscaling_group when using metrics have changed? (Now aws_autoscaling_groupname)

We have monitors and dashboards templated in Terraform that are used when creating new accounts and have found that ones using queries by "autoscaling_group" now report no data.
Looking in metrics I can see the only option for grouping by ASG is "aws_autoscaling_groupname" but can't seem to find where this is set. AWS Auto Scaling integration documentation also shows that this should be autoscaling_group.
Where can I set this?
If you're generating the Autoscaling group via the Terraform aws_autoscaling_group resource, then there's a name parameter that is distinct from the resource name.
An example that shows the difference:
resource "aws_placement_group" "prod-asg" {
name = "application123"
strategy = "cluster"
}
In this example, when generating dashboards, the ASG name you want to add to widgets would be application123, which should end up as the autoscaling_group name in Datadog.
If using Terraform to build the dashboard widgets, then the reference would be something like this:
resource "datadog_dashboard" "monitoring" {
title = "..."
widget {
type = "timeseries"
title = "..."
request {
q = "avg:aws.autoscaling.desired_capacity{name:${aws_autoscaling_group.prod-asg.name}}.as_count()"
}
}
}
Refs: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group

how to set log retention days for Cloudfront function in terraform?

I have an example Cloudfront function:
resource "aws_cloudfront_function" "cool_function" {
name = "cool-function"
runtime = "cloudfront-js-1.0"
comment = "The cool function"
publish = true
code = <<EOT
function handler(event) {
var headers = event.request.headers;
if (
typeof headers.coolheader === "undefined" ||
headers.coolheader.value !== "That_is_cool_bro"
) {
console.log("That is not cool bro!")
}
return event.request;
}
EOT
}
When I create this function, Cloudwatch /aws/cloudfront/function/cool-function log group will be created automatically
But log group retention policy is Never Expire
And I can't see any parameters in terraform that allow to set retention days
So the question is:
is it possible to automatically import aws_cloudwatch_log_group every time when Cloudfront function creating and change retention_in_days for this resource?
Quite a few AWS services create their log groups implicitly on first use. To prevent that you need to explicitly create the group before the service has a chance to do it.
For that you need to define the aws_cloudwatch_log_group with the given name yourself, specify the correct retention and then create an explicit depends_on relation between the function and the log group to ensure the log group is created first. For migration purposes you now would need to import already created log groups into your terraform state.
resource "aws_cloudfront_function" "cool_function" {
name = "cool-function"
...
depends_on = [
aws_cloudwatch_log_group.logs
]
}
resource "aws_cloudwatch_log_group" "logs" {
name = "/aws/cloudfront/function/cool-function"
retention_in_days = 123
...
}

How to create an alert policy for unknown custom metric in GCP

Given the following alert policy in GCP (created with terraform)
resource "google_monitoring_alert_policy" "latency_alert_policy" {
display_name = "Latency of 95th percentile more than 1 second"
combiner = "OR"
conditions {
display_name = "Latency of 95th percentile more than 1 second"
condition_threshold {
filter = "metric.type=\"custom.googleapis.com/http/server/requests/p95\" resource.type=\"k8s_pod\""
threshold_value = 1000
duration = "60s"
comparison = "COMPARISON_GT"
aggregations {
alignment_period = "60s"
per_series_aligner= "ALIGN_NEXT_OLDER"
cross_series_reducer= "REDUCE_MAX"
group_by_fields = [
"metric.label.\"uri\"",
"metric.label.\"method\"",
"metric.label.\"status\"",
"metadata.user_labels.\"app.kubernetes.io/name\"",
"metadata.user_labels.\"app.kubernetes.io/component\""
]
}
trigger {
count = 1
percent = 0
}
}
}
}
I get the following this error (which is part of a terraform project also creating the cluster):
Error creating AlertPolicy: googleapi: Error 404: The metric referenced by the provided filter is unknown. Check the metric name and labels.
Now, this is a custom metric (by a Spring Boot app with Micrometer), therefore this metric does not exist when creating infrastructure. Does GCP have to know a metric before creating an alert for it? This would mean that a Spring boot app has to be deployed on a cluster and sending metrics before this policy can be created?
Am I missing something... (like this should not be done in terraform, infrastructure)?
interesting question, the reason for the 404 error is because the resource was not found, there seems to be a preexisting pre-requisite for the descriptor. I would create the metric descriptor first, you can use this as reference, then going forward on creating the alerting policy.
This is an ingenious way you may avoid it. Please comment if it makes sense and if you make it work like this, share it.
For reference (this can be referenced from the alert policy according to terraform doc):
resource "google_monitoring_metric_descriptor" "p95_latency" {
description = ""
display_name = ""
type = "custom.googleapis.com/http/server/requests/p95"
metric_kind = "GAUGE"
value_type = "DOUBLE"
labels {
key = "status"
}
labels {
key = "uri"
}
labels {
key = "exception"
}
labels {
key = "method"
}
labels {
key = "outcome"
}
}

AWS CodeBuild Branch filter option removed

We are using AWS CodeBuild Branch filter option to trigger a build only when a PUSH to Master is made. However, The 'Branch filter' option has been apparently removed recently and 'Webhook event filter group' are added. They should provide more functionality I expect, but I cannot see how to make the 'Branch filter'.
Can someone help?
I couldn't see this change flagged anywhere, but it worked for me setting Event Type as PUSH and HEAD_REF to be
refs/heads/branch-name
as per
https://docs.aws.amazon.com/codebuild/latest/userguide/sample-github-pull-request.html
You need to use filter groups, instead of branch_filters.
Example in terraform (0.12+);
For feature branches ;
resource "aws_codebuild_webhook" "feature" {
project_name = aws_codebuild_project.feature.name
filter_group {
filter {
type = "EVENT"
pattern = "PULL_REQUEST_CREATED, PULL_REQUEST_UPDATED, PULL_REQUEST_REOPENED"
}
filter {
type = "HEAD_REF"
pattern = "^(?!^/refs/heads/master$).*"
exclude_matched_pattern = false
}
}
}
For master branch.
resource "aws_codebuild_webhook" "master" {
project_name = aws_codebuild_project.master.name
filter_group {
filter {
type = "EVENT"
pattern = "PUSH"
}
filter {
type = "HEAD_REF"
pattern = "^refs/heads/master$"
exclude_matched_pattern = false
}
}
}
So they both requires an aws_codebuild_project per each. Thus you will have 2 CodeBuild projects per repository.
branch_filter does not work in CodeBuild, although it is still configurable via UI or API. filter_groups are the one that has the required logic.

get all metrics which have alarms boto

I am new to boto and trying to get all the metrics that have alarms. Can some one please guide me how to do that? Here is what I am trying to do. I can get all the metrics in the following way.
import boto.ec2.cloudwatch
conn = boto.ec2.cloudwatch.connect_to_region('ap-southeast-1')
metrics = conn.list_metrics()
for metric in metrics:
print metric.name, metric.namespace
I know that there is a function "describe_alarms_for_metric" that returns the alarms for a metric. However it is not working for me and gives me an empty list. Here is what I am trying.
for metric in metrics:
print conn.describe_alarms_for_metric(metric.name, metric.namespace)
I can also see the list of all alarms using "describe_alarms" but I dont know which alarm is for what metric.
alarms = conn.describe_alarms()
for alarm in alarms:
print alarm
describe_alarms() returns a list of boto.ec2.cloudwatch.alarm objects, which can be inspected to find out the metric and other details about the alarm.
alarms = conn.describe_alarms()
for alarm in alarms:
print alarm.name
print alarm.metric
print alarm.namespace
For Boto3 apparently describe_alarms_for_metric() doesn't work unless you also supply a dimension - see the documentation:
Dimensions (list) -- The dimensions associated with the metric. If the
metric has any associated dimensions, you must specify them in order
for the call to succeed.
(dict) -- Expands the identity of a metric.
Name (string) -- [REQUIRED] The name of the dimension.
Value (string) -- [REQUIRED] The value representing the dimension
measurement.
With that requirement I'm not sure what the point of this API is. An alternative is to use describe_alarms() through the paginator then specify a filter.
You can use the example here as a base:
import boto3
# Create CloudWatch client
cloudwatch = boto3.client('cloudwatch')
# List alarms of insufficient data through the pagination interface
paginator = cloudwatch.get_paginator('describe_alarms')
for response in paginator.paginate(StateValue='INSUFFICIENT_DATA'):
print(response['MetricAlarms'])
Then modify it to add a filter:
paginator = cloudwatch.get_paginator('describe_alarms')
page_iterator = paginator.paginate()
filtered_iterator = page_iterator.search("MetricAlarms[?MetricName==`CPUUtilization` && Namespace==`AWS/EC2`]")
for alarm in filtered_iterator:
print(alarm)
More information in the API docs here and here.