Google Cloud Stackdriver: Metric grouped by ip - google-cloud-platform

I want to create stackdriver metrics, based on the ip and the frequency of requests an ip makes.
Therefore I would like to group by ip (the IP address of a requesting client) my loadbalancer logs, and if the number of requests exceed a threshold sent a notification.
Edit:
A workaround to achieve this.
Go to Stackdriver Logging and create a User-defined Metric that counts the total requests.
Fire an alarm when requests exceed a threshold.
Alarms call a lambda function that create a sync from stackdriver to bigquery
Execute the queries in order to find out the ip that causes the trouble

In Stackdriver Logging, create a User-defined Metric (myMetric) [1] filtered on the desired IP address,
In Stackdriver Monitoring, find resource type and metric by locating myMetric to create the chart.
[1] https://cloud.google.com/logging/docs/logs-based-metrics/

There is no out of the box solution so there can be a workaround with BigQuery
Go to Stackdriver Logging and create a User-defined Metric that counts the total requests.
Fire an alarm when requests exceed a threshold.
Alarms call a lambda function that create a sync from stackdriver to bigquery
Execute the queries in order to find out the ip that causes the trouble

Related

Create an alarm based on a CloudWatch insight query

My problem:
I would like to blacklist IPs which are accessing my public AWS API Gateway endpoint more than 5 times a hour.
My proposed solution:
Requests are logged to CloudWatch
Requests are counted and grouped by IP
An alarm monitors IPs send a message to a SNS topic in case the threshold is met
Lambda is triggered by the message and blacklists the IP
I am able to log and count the IPs by using the Insight query below:
fields ip
| stats count() as ipCount by ip
| filter ispresent(ip)
| sort ipCount desc
What I am struggling to accomplish is getting an CloudWatch Alarm based on this query.
I have searched a lot but no success. Any ideas on how to create such a metric / alert?
I know you planned to do a custom Lambda, but check if WAF already fulfills your use case. For example, the rate limit section in this article here clearly allows you to define the rate per 5-minutes for a given IP:
https://docs.aws.amazon.com/waf/latest/developerguide/classic-web-acl-rules-creating.html
If you are not doing anything else, a custom Lambda function may not be needed.
EDIT
If you want to go down the path of CloudWatch alarms, I think you can define a metric filter to create a CloudWatch metric. Then you can create the alarm based on the metric.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/MonitoringLogData.html
The best approche is to use the managed services "AWS WAF" which is perfectly integrated with your APIs.
The problem with a custom solution, is the latency, time to aggregate logs, count, and the cost, because each time a lambda will run with queries....
In API Gateway you can attach a WAF Web ACL directly, you can indicate the rate per 5 min, per 10min... for you need, is the job of the WAF.

Google Cloud Metric to monitor instance group size

I can find a graph of "Group size" in the page of the instance group.
However, when I try to find this metric in Stackdriver, it doesn't exist.
I tried looking in the metricDescriptors API, but it doesn't seem to be there either.
Where can I find this metric?
I'm particularly interested in sending alerts when this metrics goes to 0.
There is not a Stackdriver Monitoring metric for this data yet. You can fetch the size using the instanceGroups.get API call. You could create a system that polls this data and posts it back to Stackdriver Monitoring as a custom metric and then you will be able to access it from Stackdriver.

AWS cloudwatch alarm for RDS

Is there a way to make an alarm on cloudwatch for my RDS instances based on % free disk (i know i can turn on enhanced monitoring and that metric is there, but i can't use those metrics on cloudwatch alarms)
if not is there a good way out ?
RDS doesn't report percentage of disk space free, but it does report the amount of free space available. See the list of CloudWatch metrics available for your RDS instances here.
You would need to create alarms on the FreeStorageSpace metric reported by each of your instances.
See an option using Enhanced Monitoring and log metrics enter link description. Basically you can turn on enhanced monitoring for RDS and then parse the JSON logs to get the usedPercentage value for the storage filesystem. This can be turned into a log metric that can be associated with an alarm.

Cloudwatch Metric showing wrong value

I have an application publishing a custom cloudwatch metric using boto's put_metric_data. The metric shows the number of tasks waiting in a redis queue.
The 1-minute max shows '3', 1-minute min shows '0' and 1-minute average shows '1.5'.
It seems that the application is correctly setting the value to zero, but some other process is overwriting it with 3 at the same time, but I can't find this to stop it.
Is it possible to see logs for PutMetricData to diagnose where this value might be coming from?
Normally, Amazon CloudTrail would be the ideal way to discover information about API calls being made to your AWS account. Unfortunately, PutMetricData is not captured in Amazon CloudTrail.
From Logging Amazon CloudWatch API Calls in AWS CloudTrail:
The CloudWatch GetMetricStatistics, ListMetrics, and PutMetricData API actions are not supported.

Detect thrashing on AWS Auto Scale Group

Sometimes if there are conditions that prevent the app from starting, say a bad config, the auto scaler will continue to start up instances one after the other.
Anybody know of a good way to alert on this?
Most of our servers receive network traffic so we put a CloudWatch monitor on the NetworkIn metric.
I would suggest configuring the start-up script to Terminate/Shutdown the instance upon failure and sending an alert using CloudWatch custom metrics or any other service like NewRelic.
I don't think that there is a way to alert auto-scaling-group to stop spanning up instances. You could set the max instances limit and have an alert upon reaching this number.
You could alert based on the CloudWatch metric:
Auto Scaling / Group Metrics / GroupTerminatingInstances
See the doc page for more details