How to retrieve AWS Cloudwatch metrics using AWSSDK.CloudWatch? - amazon-web-services

I'm trying to retrieve data about my load balancers using the AWSSDK.CloudWatch package, but having no luck in actually getting any values out of it. It seems no matter what, the Values property of the MetricData in the response is an empty array.
AmazonCloudWatchClient client = new AmazonCloudWatchClient("MyAccessKeyId", "MySecretAccessKey", Amazon.RegionEndpoint.MyRegion);
GetMetricDataRequest request = new GetMetricDataRequest()
{
StartTime = DateTime.UtcNow.AddHours(-12),
EndTime = DateTime.UtcNow,
MetricDataQueries = new List<MetricDataQuery>()
{
new MetricDataQuery()
{
Id = "MyMetric",
MetricStat = new MetricStat()
{
Metric = new Metric()
{
Namespace = "AWS/ELB",
MetricName = "HealthyHostCount",
Dimensions = new List<Dimension>()
{
new Dimension()
{
Name = "LoadBalancerName",
Value = "MyLoadBalancerName"
}
}
},
Period = 300,
Stat = "Sum",
Unit = "None"
}
}
},
ScanBy = ScanBy.TimestampDescending,
MaxDatapoints = 1000
};
GetMetricDataResponse response = client.GetMetricData(request);
I'm struggling to find any relevant examples of this. I'd prefer to be able to obtain this value per-load balancer.

There are many things that could cause your query to return no data. This is how I would approach debugging this:
Was the response 200 OK? If not, something is wrong with the query itself, missing required parameter, credentials are not valid or policy does not allow GetMetricData calls.
Is the metric name correct? Full metric name must be correct, that includes namespace, metric name and all of the dimensions. CloudWatch will not distinguish between no data case and no metric case, you will just get no data back. This is a potential issue in your request, if your hosts are in a target group you may need to specify the target group dimension.
Is the region endpoint correct? Metrics are separated by region and you have to call the correct region endpoint.
Are the credentials from the correct account?
Is the unit correct? If you are not sure about the unit, don't specify it. This is the second thing that could be an issue with your request, this metric could have the unit Count. Try it without specifying the unit.
Is the time range correct? Was the data being published for the time range you are requesting?

Related

how to set log retention days for Cloudfront function in terraform?

I have an example Cloudfront function:
resource "aws_cloudfront_function" "cool_function" {
name = "cool-function"
runtime = "cloudfront-js-1.0"
comment = "The cool function"
publish = true
code = <<EOT
function handler(event) {
var headers = event.request.headers;
if (
typeof headers.coolheader === "undefined" ||
headers.coolheader.value !== "That_is_cool_bro"
) {
console.log("That is not cool bro!")
}
return event.request;
}
EOT
}
When I create this function, Cloudwatch /aws/cloudfront/function/cool-function log group will be created automatically
But log group retention policy is Never Expire
And I can't see any parameters in terraform that allow to set retention days
So the question is:
is it possible to automatically import aws_cloudwatch_log_group every time when Cloudfront function creating and change retention_in_days for this resource?
Quite a few AWS services create their log groups implicitly on first use. To prevent that you need to explicitly create the group before the service has a chance to do it.
For that you need to define the aws_cloudwatch_log_group with the given name yourself, specify the correct retention and then create an explicit depends_on relation between the function and the log group to ensure the log group is created first. For migration purposes you now would need to import already created log groups into your terraform state.
resource "aws_cloudfront_function" "cool_function" {
name = "cool-function"
...
depends_on = [
aws_cloudwatch_log_group.logs
]
}
resource "aws_cloudwatch_log_group" "logs" {
name = "/aws/cloudfront/function/cool-function"
retention_in_days = 123
...
}

How to modify the request/response to get directly the prices of an AWS virtual machine?

I am currently learning how to use AWS pricing SDK. My objective is to get all the prices of aws virtual machines, as the prices can be different from a region to another.
Basically, I am running this code :
AmazonPricingClient client = new(keyId, key, RegionEndpoint.USEast1);
// Developement filters to handle smaller amount of data
GetProductsRequest request = new() {
ServiceCode = "AmazonEC2",
Filters = new() {
new ()
{
Field = "vcpu",
Type = "TERM_MATCH",
Value = "2"
},
new()
{
Field = "currentGeneration",
Type = "TERM_MATCH",
Value = "Yes"
},
new()
{
Field = "regionCode",
Type = "TERM_MATCH",
Value = "eu-west-1"
},
new()
{
Field = "operatingSystem",
Type = "TERM_MATCH",
Value = "Windows"
}
}
};
GetProductsResponse response = await client.GetProductsAsync(request);
Taking in consideration the filters (added to reduce the amount of data while testing the code out), I will only get the prices of the matching virtual machines for region eu-west-1.
If I delete this dev filter (for production for exemple), I will get the prices of every region, but each time, I will also get this part of the returned json :
"product":{
"productFamily":"Compute Instance",
"attributes":{
"enhancedNetworkingSupported":"Yes",
"intelTurboAvailable":"No",
"memory":"16 GiB",
"dedicatedEbsThroughput":"Up to 3500 Mbps",
[...]
"operation":"RunInstances:000g",
"availabilityzone":"NA"
},
"sku":"2A56CED7V5PFGAH8"
}
And this part would be duplicated for each region.
Is there a way to tell the api that I just want the different prices of a specific virtual machine ? Using either the request or the response objects ?
I may have missed some possibilities offered by the SDK, feel free to tell me anything I can improve in that snippet, good practices,...
Thanks !

GCP terraform - alerts module based on log metrics

As per subject, I have set up log based metrics for a platform in gcp i.e. firewall, audit, route etc. monitoring.
enter image description here
Now I need to setup alert policies tied to these log based metrics, which is easy enough to do manually in gcp.
enter image description here
However, I need to do it via terraform thus using this module:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy#nested_alert_strategy
I might be missing something very simple but finding it hard to understand this as the alert strategy is apparently required but yet does not seem to be supported?
I am also a bit confused on which kind of condition I should be using to match my already setup log based metric?
This is my module so far, PS. I have tried using the same filter as I did for setting up the log based metric as well as the name of the log based filter:
resource "google_monitoring_alert_policy" "alert_policy" {
display_name = var.display_name
combiner = "OR"
conditions {
display_name = var.display_name
condition_matched_log {
filter = var.filter
#duration = "600s"
#comparison = "COMPARISON_GT"
#threshold_value = 1
}
}
user_labels = {
foo = "bar"
}
}
var filter is:
resource.type="gce_route" AND (protoPayload.methodName:"compute.routes.delete" OR protoPayload.methodName:"compute.routes.insert")
Got this resolved in the end.
Turns out common issue:
https://issuetracker.google.com/issues/143436657?pli=1
Had to add this to the filter parameter in my terraform module after the metric name - AND resource.type="global"

How to create an alert policy for unknown custom metric in GCP

Given the following alert policy in GCP (created with terraform)
resource "google_monitoring_alert_policy" "latency_alert_policy" {
display_name = "Latency of 95th percentile more than 1 second"
combiner = "OR"
conditions {
display_name = "Latency of 95th percentile more than 1 second"
condition_threshold {
filter = "metric.type=\"custom.googleapis.com/http/server/requests/p95\" resource.type=\"k8s_pod\""
threshold_value = 1000
duration = "60s"
comparison = "COMPARISON_GT"
aggregations {
alignment_period = "60s"
per_series_aligner= "ALIGN_NEXT_OLDER"
cross_series_reducer= "REDUCE_MAX"
group_by_fields = [
"metric.label.\"uri\"",
"metric.label.\"method\"",
"metric.label.\"status\"",
"metadata.user_labels.\"app.kubernetes.io/name\"",
"metadata.user_labels.\"app.kubernetes.io/component\""
]
}
trigger {
count = 1
percent = 0
}
}
}
}
I get the following this error (which is part of a terraform project also creating the cluster):
Error creating AlertPolicy: googleapi: Error 404: The metric referenced by the provided filter is unknown. Check the metric name and labels.
Now, this is a custom metric (by a Spring Boot app with Micrometer), therefore this metric does not exist when creating infrastructure. Does GCP have to know a metric before creating an alert for it? This would mean that a Spring boot app has to be deployed on a cluster and sending metrics before this policy can be created?
Am I missing something... (like this should not be done in terraform, infrastructure)?
interesting question, the reason for the 404 error is because the resource was not found, there seems to be a preexisting pre-requisite for the descriptor. I would create the metric descriptor first, you can use this as reference, then going forward on creating the alerting policy.
This is an ingenious way you may avoid it. Please comment if it makes sense and if you make it work like this, share it.
For reference (this can be referenced from the alert policy according to terraform doc):
resource "google_monitoring_metric_descriptor" "p95_latency" {
description = ""
display_name = ""
type = "custom.googleapis.com/http/server/requests/p95"
metric_kind = "GAUGE"
value_type = "DOUBLE"
labels {
key = "status"
}
labels {
key = "uri"
}
labels {
key = "exception"
}
labels {
key = "method"
}
labels {
key = "outcome"
}
}

Setting "count" based on the length of an attribute on another resource

I have a fairly simple Terraform configuration, which creates a Route53 zone and then creates NS records in Cloudflare to delegate the subdomain to that zone. At present, it assumes there's always exactly four authoritative DNS servers for every Route53 zone, and creates four separate cloudflare_record resources, but I'd like to generalise that, partially because who knows if AWS will start putting a fifth authoritative server out there in the future, but also as a "test case" for more complicated stuff in the future (like AWS AZs, which I know vary in count between regions).
What I've come up with so far is:
resource "cloudflare_record" "public-zone-ns" {
domain = "example.com"
name = "${terraform.env}"
type = "NS"
ttl = "120"
count = "${length(aws_route53_zone.public-zone.name_servers)}"
value = "${lookup(aws_route53_zone.public-zone.name_servers, count.index)}"
}
resource "aws_route53_zone" "public-zone" {
name = "${terraform.env}.example.com"
}
When I run terraform plan over this, though, I get this error:
Error running plan: 1 error(s) occurred:
* cloudflare_record.public-zone-ns: cloudflare_record.public-zone-ns: value of 'count' cannot be computed
I think what that means is that because the aws_route53_zone hasn't actually be created, terraform doesn't know what length(aws_route53_zone.public-zone.name_servers) is, and therefore the interpolation into cloudflare_record.public-zone-ns.count fails and I'm screwed.
However, it seems surprising to me that Terraform would be so inflexible; surely being able to create a variable number of resources like this would be meat-and-potatoes stuff. Hard-coding the length, or creating separate resources, just seems so... limiting.
So, what am I missing? How can I create a number of resources when I don't know in advance how many I need?
Currently count not being able to be calculated is an open issue in terraform https://github.com/hashicorp/terraform/issues/12570
You could move the name servers to a variable array and then get the length of that, all in one terraform script.
I got your point, this surprised me as well. Even I added depends_on to resource cloudflare_record, it is helpless.
What you can do to pass this issue is to split it into two stacks and make sure the route 53 record is created before cloudflare record.
Stack #1
resource "aws_route53_zone" "public-zone" {
name = "${terraform.env}.example.com"
}
output "name_servers" {
value = "${aws_route53_zone.public-zone.name_servers}"
}
Stack #2
data "terraform_remote_state" "route53" {
backend = "s3"
config {
bucket = "terraform-state-prod"
key = "network/terraform.tfstate"
region = "us-east-1"
}
}
resource "cloudflare_record" "public-zone-ns" {
domain = "example.com"
name = "${terraform.env}"
type = "NS"
ttl = "120"
count = "${length(data.terraform_remote_state.route53.name_servers)}"
value = "${element(data.terraform_remote_state.route53.name_servers, count.index)}"
}