get all metrics which have alarms boto - amazon-web-services

I am new to boto and trying to get all the metrics that have alarms. Can some one please guide me how to do that? Here is what I am trying to do. I can get all the metrics in the following way.
import boto.ec2.cloudwatch
conn = boto.ec2.cloudwatch.connect_to_region('ap-southeast-1')
metrics = conn.list_metrics()
for metric in metrics:
print metric.name, metric.namespace
I know that there is a function "describe_alarms_for_metric" that returns the alarms for a metric. However it is not working for me and gives me an empty list. Here is what I am trying.
for metric in metrics:
print conn.describe_alarms_for_metric(metric.name, metric.namespace)
I can also see the list of all alarms using "describe_alarms" but I dont know which alarm is for what metric.
alarms = conn.describe_alarms()
for alarm in alarms:
print alarm

describe_alarms() returns a list of boto.ec2.cloudwatch.alarm objects, which can be inspected to find out the metric and other details about the alarm.
alarms = conn.describe_alarms()
for alarm in alarms:
print alarm.name
print alarm.metric
print alarm.namespace

For Boto3 apparently describe_alarms_for_metric() doesn't work unless you also supply a dimension - see the documentation:
Dimensions (list) -- The dimensions associated with the metric. If the
metric has any associated dimensions, you must specify them in order
for the call to succeed.
(dict) -- Expands the identity of a metric.
Name (string) -- [REQUIRED] The name of the dimension.
Value (string) -- [REQUIRED] The value representing the dimension
measurement.
With that requirement I'm not sure what the point of this API is. An alternative is to use describe_alarms() through the paginator then specify a filter.
You can use the example here as a base:
import boto3
# Create CloudWatch client
cloudwatch = boto3.client('cloudwatch')
# List alarms of insufficient data through the pagination interface
paginator = cloudwatch.get_paginator('describe_alarms')
for response in paginator.paginate(StateValue='INSUFFICIENT_DATA'):
print(response['MetricAlarms'])
Then modify it to add a filter:
paginator = cloudwatch.get_paginator('describe_alarms')
page_iterator = paginator.paginate()
filtered_iterator = page_iterator.search("MetricAlarms[?MetricName==`CPUUtilization` && Namespace==`AWS/EC2`]")
for alarm in filtered_iterator:
print(alarm)
More information in the API docs here and here.

Related

GCP terraform - alerts module based on log metrics

As per subject, I have set up log based metrics for a platform in gcp i.e. firewall, audit, route etc. monitoring.
enter image description here
Now I need to setup alert policies tied to these log based metrics, which is easy enough to do manually in gcp.
enter image description here
However, I need to do it via terraform thus using this module:
https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/monitoring_alert_policy#nested_alert_strategy
I might be missing something very simple but finding it hard to understand this as the alert strategy is apparently required but yet does not seem to be supported?
I am also a bit confused on which kind of condition I should be using to match my already setup log based metric?
This is my module so far, PS. I have tried using the same filter as I did for setting up the log based metric as well as the name of the log based filter:
resource "google_monitoring_alert_policy" "alert_policy" {
display_name = var.display_name
combiner = "OR"
conditions {
display_name = var.display_name
condition_matched_log {
filter = var.filter
#duration = "600s"
#comparison = "COMPARISON_GT"
#threshold_value = 1
}
}
user_labels = {
foo = "bar"
}
}
var filter is:
resource.type="gce_route" AND (protoPayload.methodName:"compute.routes.delete" OR protoPayload.methodName:"compute.routes.insert")
Got this resolved in the end.
Turns out common issue:
https://issuetracker.google.com/issues/143436657?pli=1
Had to add this to the filter parameter in my terraform module after the metric name - AND resource.type="global"

How use Filters with boto3 vpc endpoint services?

I need to get vpc endpoint service ID from python script, but I don't understand how use boto3, filters from vpc-id or a subnet
How do I use Filters?
This part of boto3
> (dict) --
A filter name and value pair that is used to return a more specific list of results from a describe operation. Filters can be used to match a set of resources by specific criteria, such as tags, attributes, or IDs. The filters supported by a describe operation are documented with the describe operation. For example:
DescribeAvailabilityZones
DescribeImages
DescribeInstances
DescribeKeyPairs
DescribeSecurityGroups
DescribeSnapshots
DescribeSubnets
DescribeTags
DescribeVolumes
DescribeVpcs
Name (string) --
The name of the filter. Filter names are case-sensitive.
Values (list) --
The filter values. Filter values are case-sensitive.
(string) --
The easiest method would be to call it with no filters, and observe what comes back:
import boto3
ec2_client = boto3.client('ec2', region_name='ap-southeast-2')
response = ec2_client.describe_vpc_endpoint_services()
for service in response['ServiceDetails']:
print(service['ServiceId'])
You can then either filter the results within your Python code, or use the Filters capability of the Describe command.
Feel free to print(response) to see the data that comes back.
It depends on what you want to filter the results with. In my case, I use below to filter it for a specific vpc-endpoint-id.
import boto3
vpc_client = boto3.client('ec2')
vpcEndpointId = "vpce-###"
vpcEndpointDetails = vpc_client.describe_vpc_endpoints(
VpcEndpointIds=[vpcEndpointId],
Filters=[
{
'Name': 'vpc-endpoint-id',
'Values': [vpcEndpointId]
},
])

How to get the list of instances for AWS EMR?

Why is the list for EC2 different from the EMR list?
EC2: https://aws.amazon.com/ec2/spot/pricing/
EMR: https://aws.amazon.com/emr/pricing/
Why are not all the types of instances from the EC2 available for EMR? How to get this special list?
In case your question is not about the amazon console
(then it would surely be closed as off-topic):
As a programming solution, you are looking something like this: (using python boto3)
import boto3
client = boto3.client('emr')
for instance in client.list_instances():
print("Instance[%s] %s"%(instance.id, instance.name))
This is what I use, although I'm not 100% sure it's accurate (because I couldn't find documentation to support some of my choices (-BoxUsage, etc.)).
It's worth looking through the responses from AWS in order to figure out what the different values are for different fields in the pricing client responses.
Use the following to get the list of responses:
default_profile = boto3.session.Session(profile_name='default')
# Only us-east-1 has the pricing API
# - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/pricing.html
pricing_client = default_profile.client('pricing', region_name='us-east-1')
service_name = 'ElasticMapReduce'
product_filters = [
{'Type': 'TERM_MATCH', 'Field': 'location', 'Value': aws_region_name}
]
response = pricing_client.get_products(
ServiceCode=service_name,
Filters=product_filters,
MaxResults=100
)
response_list.append(response)
num_prices = 100
while 'NextToken' in response:
# re-query to get next page
Once you've gotten the list of responses, you can then filter out the actual instance info:
emr_prices = {}
for response in response_list:
for price_info_str in response['PriceList']:
price_obj = json.loads(price_info_str)
attributes = price_obj['product']['attributes']
# Skip pricing info that doesn't specify a (EC2) instance type
if 'instanceType' not in attributes:
continue
inst_type = attributes['instanceType']
# AFAIK, Only usagetype attributes that contain the string '-BoxUsage' are the ones that contain the prices that we would use (empirical research)
# Other examples of values are <REGION-CODE>-M3BoxUsage, <REGION-CODE>-M5BoxUsage, <REGION-CODE>-M7BoxUsage (no clue what that means.. )
if '-BoxUsage' not in attributes['usagetype']:
continue
if 'OnDemand' not in price_obj['terms']:
continue
on_demand_info = price_obj['terms']['OnDemand']
price_dim = list(list(on_demand_info.values())[0]['priceDimensions'].values())[0]
emr_price = Decimal(price_dim['pricePerUnit']['USD'])
emr_prices[inst_type] = emr_price
Realistically, it's straightforward enough to figure this out from the boto3 docs. In particular, the get_products documentation.

Trying to disable all the Cloud Watch alarms in one shot

My organization is planning for a maintenance window for the next 5 hours. During that time, I do not want Cloud Watch to trigger alarms and send notifications.
Earlier, when I had to disable 4 alarms, I have written the following code in AWS Lambda. This worked fine.
import boto3
import collections
client = boto3.client('cloudwatch')
def lambda_handler(event, context):
response = client.disable_alarm_actions(
AlarmNames=[
'CRITICAL - StatusCheckFailed for Instance 456',
'CRITICAL - StatusCheckFailed for Instance 345',
'CRITICAL - StatusCheckFailed for Instance 234',
'CRITICAL - StatusCheckFailed for Instance 123'
]
)
But now, I was asked to disable all the alarms which are 361 in number. So, including all those names would take a lot of time.
Please let me know what I should do now?
Use describe_alarms() to obtain a list of them, then iterate through and disable them:
import boto3
client = boto3.client('cloudwatch')
response = client.describe_alarms()
names = [[alarm['AlarmName'] for alarm in response['MetricAlarms']]]
disable_response = client.disable_alarm_actions(AlarmNames=names)
You might want some logic around the Alarm Name to only disable particular alarms.
If you do not have the specific alarm arns, then you can use the logic in the previous answer. If you have a specific list of arns that you want to disable, you can fetch names using this:
def get_alarm_names(alarm_arns):
names = []
response = client.describe_alarms()
for i in response['MetricAlarms']:
if i['AlarmArn'] in alarm_arns:
names.append(i['AlarmName'])
return names
Here's a full tutorial: https://medium.com/geekculture/terraform-structure-for-enabling-disabling-alarms-in-batches-5c4f165a8db7

Google PubSub - Counting messages in topic

I've looked over the documentation for Google's PubSub, and also tried looking in Google Cloud Monitoring, but couldn't find any means of figuring out what's the queue size in my topics.
Since I plan on using PubSub for analytics, it's important for me to monitor the queue count, so I could scale up/down the subscriber count.
What am I missing?
The metric you want to look at is "undelivered messages." You should be able to set up alerts or charts that monitor this metric in Google Cloud Monitoring under the "Pub/Sub Subscription" resource type. The number of messages that have not yet been acknowledged by subscribers, i.e., queue size, is a per-subscription metric as opposed to a per-topic metric. For info on the metric, see pubsub.googleapis.com/subscription/num_undelivered_messages in the GCP Metrics List (and others for all of the Pub/Sub metrics available).
This might help if you're looking into a programmatic way to achieve this:
from google.cloud import monitoring_v3
from google.cloud.monitoring_v3 import query
project = "my-project"
client = monitoring_v3.MetricServiceClient()
result = query.Query(
client,
project,
'pubsub.googleapis.com/subscription/num_undelivered_messages',
minutes=60).as_dataframe()
print(result['pubsub_subscription'][project]['subscription_name'][0])
The answer to your question is "no", there is no feature for PubSub that shows these counts. The way you have to do it is via log event monitoring using Stackdriver (it took me some time to find that out too).
The colloquial answer to this is do the following, step-by-step:
Navigate from GCloud Admin Console to: Monitoring
This opens a new window with separate Stackdriver console
Navigate in Stackdriver: Dashboards > Create Dashboard
Click the Add Chart button top-right of dashboard screen
In the input box, type num_undelivered_messages and then SAVE
Updated version based on #steeve's answer. (without pandas dependency)
Please note that you have to specify end_time instead of using default utcnow().
import datetime
from google.cloud import monitoring_v3
from google.cloud.monitoring_v3 import query
project = 'my-project'
sub_name = 'my-sub'
client = monitoring_v3.MetricServiceClient()
result = query.Query(
client,
project,
'pubsub.googleapis.com/subscription/num_undelivered_messages',
end_time=datetime.datetime.now(),
minutes=1,
).select_resources(subscription_id=sub_name)
for content in result:
print(content.points[0].value.int64_value)
Here is a java version
package com.example.monitoring;
import static com.google.cloud.monitoring.v3.MetricServiceClient.create;
import static com.google.monitoring.v3.ListTimeSeriesRequest.newBuilder;
import static com.google.monitoring.v3.ProjectName.of;
import static com.google.protobuf.util.Timestamps.fromMillis;
import static java.lang.System.currentTimeMillis;
import com.google.monitoring.v3.ListTimeSeriesRequest;
import com.google.monitoring.v3.TimeInterval;
public class ReadMessagesFromGcp {
public static void main(String... args) throws Exception {
String projectId = "put here";
var interval = TimeInterval.newBuilder()
.setStartTime(fromMillis(currentTimeMillis() - (120 * 1000)))
.setEndTime(fromMillis(currentTimeMillis()))
.build();
var request = newBuilder().setName(of(projectId).toString())
.setFilter("metric.type=\"pubsub.googleapis.com/subscription/num_undelivered_messages\"")
.setInterval(interval)
.setView(ListTimeSeriesRequest.TimeSeriesView.FULL)
.build();
var response = create().listTimeSeries(request);
for (var subscriptionData : response.iterateAll()) {
var subscription = subscriptionData.getResource().getLabelsMap().get("subscription_id");
var numberOrMessages = subscriptionData.getPointsList().get(0).getValue().getInt64Value();
if(numberOrMessages > 0) {
System.out.println(subscription + " has " + numberOrMessages + " messages ");
}
}
}
}
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-monitoring</artifactId>
<version>3.3.2</version>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java-util</artifactId>
<version>4.0.0-rc-2</version>
</dependency>
output
queue-1 has 36 messages
queue-2 has 4 messages
queue-3 has 3 messages
There is a way to count all messages published to a topic using custom metrics.
In my case I am publishing messages to a Pub/Sub topic via a Cloud Composer (Airflow) Dag that runs a python script.
The python script returns logging information about the ran Dag.
logging.info(
f"Total events in file {counter-1}, total successfully published {counter - error_counter -1}, total errors publishing {error_counter}. Events sent to topic: {TOPIC_PATH} from filename: {source_blob_name}.",
{
"metric": "<some_name>",
"type": "completed_file",
"topic": EVENT_TOPIC,
"filename": source_blob_name,
"total_events_in_file": counter - 1,
"failed_published_messages": error_counter,
"successful_published_messages": counter - error_counter - 1,
}
I then have a Distribution custom metric which filters on resource_type, resource_lable, jsonPayload.metric and jsonPayload.type. The metric also has the Field Name set to jsonPayload.successful_published_messages
Custom metric filter:
resource.type=cloud_composer_environment AND resource.labels.environment_name={env_name} AND jsonPayload.metric=<some_name> AND jsonPayload.type=completed_file
That custom metric is then used in a Dashboard with the MQL setting of
fetch cloud_composer_environment
| metric
'logging.googleapis.com/user/my_custom_metric'
| group_by 1d, [value_pubsub_aggregate: aggregate(value.pubsub)]
| every 1d
| group_by [],
[value_pubsub_aggregate_sum: sum(value_pubsub_aggregate)]
Which to get to I first setup an Icon chart with resource type: cloud composer environment, Metric: my_custom metric, Processing step: to no preprocessing step, Alignment function: SUM, period 1, unit day, How do you want it grouped group by function: mean.
Ideally you would just select sum for the Group by function but it errors and that is why you then need to sqitch to MQL and manually enter sum instead of mean.
This will now count your published messages for up to 24 months which is the retention period set by Google for the custom metrics.