How to extract an instance uptime based on incidents? - google-cloud-platform

On stackdriver, creating an Uptime Check gives you access to the Uptime Dashboard that contains the uptime % of your service:
My problem is that uptime checks are restricted to http/tcp checks. I have other services running and those services report their health in different ways (say, for example, by a specific process running). I have incident policies already set up for this services, so if the service is not running I get notified.
Now I want to be able to look back and know how long the service was down for the last hour. Is there a way to do that?

There's no way to programmatically retrieve alerts at the moment, unfortunately. Many resource types expose uptime as a metric, though (e.g., instance/uptime on GCE instances) - could you pull those and do the math on them? Without knowing what resource types you're using, it's hard to give specific suggestions.
Aaron Sher, Stackdriver engineer

Related

Usage monitoring from whitelisted IPs

I need to setup a shared processing service that uses a load balancer and several EC2 instances to process incoming requests using a custom .NET application. My issue is that I need to be able to bill based on usage. Only white-listed IPs will be able to call the application, but each IP only gets a set number of calls before each call is a billable event.
Since the AWS documentation for the ELB states "We recommend that you use access logs to understand the nature of the requests, not as a complete accounting of all requests", I do not feel the Access Logs on the ELB is what I'm looking for.
The question I have is how to best manage this so that the accounting team has an easy report each month that says how many calls each client made.
Actually you can use Access logs and since access logs will be written to S3, you can query each IP with Athena by using standard SQL. You can analyze your logs and extract reports.
References:
https://docs.aws.amazon.com/athena/latest/ug/what-is.html
https://aws.amazon.com/premiumsupport/knowledge-center/athena-analyze-access-logs/

Is it possible to measure HTTP response latencies without changing my server code?

I have a small number of HTTP servers on GCP VMs. I have a mixture of different server languages and Linux based OS's.
Questions
A. It it possible to use the Stackdriver monitoring service to set alerts at specific percentiles for HTTP response latencies?
B. Can I do this without editing the code of each server process?
C. Will installing the agent into the VM report HTTP latencies?
For example, if the 95th percentile goes over 100ms for a certain time period I want to know.
I know I can do this for CPU utilisation and other hypervisor provided stats using:
https://console.cloud.google.com/monitoring/alerting
Thanks.
Request latencies are extracted by cloud load balancers. As long as you are using cloud load balancer you don't need to install monitoring agent to create alerts based 95th Percentile Metrics.
Monitoring agent captures latencies for some preconfigured systems such as riak, cassandra and some others. Here's a full list of systems and metrics monitoring agent supports by default: https://cloud.google.com/monitoring/api/metrics_agent
But if you want anything custom, i.e. you want to measure request latencies from VM you would need to capture response times yourself and configure logging agent to create a custom metric which you can use to create alerts. And as long as you are capturing them as distribution metrics you should be able to visualise different percentiles (i.e. 25, 50, 75, 80, 90, 95 and 99th etc.) and create alert based on that.
see: https://cloud.google.com/logging/docs/logs-based-metrics/distribution-metrics
A. It it possible to use the Stackdriver monitoring service to set
alerts at specific percentiles for HTTP response latencies?
If you want to simply consider network traffic, yes it is possible. Also if you are using a load balancer it's also possible to set alerts on that.
What you want to do should be pretty straight forward from the interface, however you can also find more info in the documentation.
If you want to use some advanced metric on top of tomcat/apache2 etc, you should check the list of metrics provided by the stackdriver monitoring agent here.
B. Can I do this without editing the code of each server process?
Yes, no need to update any program, stackdriver monitoring works transparently and will be able to fetch basic metrics from a GCP VMs without the need of the monitoring agent, including network traffic and cpu utilization.
C. Will installing the agent into the VM report HTTP latencies?
No, the agent shouldn't cause any http latencies.

AWS: Find IP address of all connections to RDS instances

Is there a way to find the IP addresses of all connections to AWS RDS instances.
You can get connected Host IP from information_schema using MySQL Query.
select host from information_schema.processlist WHERE ID=connection_id();
Because in basic monitoring only these metrics are available.
For Monitoring, choose the option for how you want to view your metrics from these:
CloudWatch:
Shows a summary of DB instance metrics available from Amazon
CloudWatch. Each metric includes a graph showing the metric monitored
over a specific time span.
Enhanced monitoring:
Shows a summary of OS metrics available for a DB instance with
Enhanced Monitoring enabled. Each metric includes a graph showing the
metric monitored over a specific time span.
OS Process list:
Shows details for each process running in the selected instance.
So the option that can help you in this regards is Performance Insights.
Performance Insights:
If you are interested in detail matrics like HOST IPs, the number of connections, Slow queries and many more which can eliminate the need of DBA I believe as very good experience with Using Amazon RDS Performance Insights
Top activity can list any of the dimensions indicated at the top of the list. For Aurora PostgreSQL, Performance Insights currently supports listing top SQL, waits, hosts, and users.
analyzing-amazon-rds-database-workload-with-performance-insights
If you are using MariaDB:
select id, user, host, db, command, time, state, info, progress from information_schema.processlist;
Other options specific to AWS.
Turn on VPC Flowlogs , then use loginishgts to query historical
traffic to the instances.
Inspect the security group of the RDS, it may list the client(s) which is allowed to connect to it.
Turn on the general query log https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.MariaDB.html

get alert when instance is not active in GCE

For Compute Engine of GCE, I use stackdriver monitoring for monitoring and alert.
For most of the general metrics like CPU, disk IO, memory ... etc is available and can set alert for those metrics based or dead-or-alive by process name.
However I cannot find any metrics related to status of GCE instance itself.
My use-case is so simply. I'd like to know if the instance id down or not.
Any suggestion appreciated.
thanks.
think the instance status not a monitoring metric; there's just instance/uptime available.
(and I have no clue what it would return when it is terminated, possibly worth a try).
but one can check for servers with Uptime Checks and then report the Incident.
and one can get the instance status with gcloud compute instances describe instance01.

Amazon EC2 Instance Monitoring?

I am in need of a fairly short/simple script to monitor my EC2 instances for Memory and CPU (for now).
After using Get-EC2Instance -Region , it lists all of the instances. from here where can i go?
Cloudwatch is the monitoring tool for AWS instances. While it can support custom metrics, by default it only measures what the hypervisor can see for your instance.
CPU utilization is supported by default, this is often a more accurate way to see your true CPU utilization since the value comes from the hypervisor.
Memory utilization however is not. This depends largely on your OS and is not visible to the hypervisor. However, you can set up a script that will report this metric to Cloudwatch. Some scripts to help you do this are here: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts-perl.html
There are a few possibilities for monitoring EC2 instances.
Nagios - http://www.nagios.com/solutions/aws-monitoring
StackDriver - http://www.stackdriver.com/
CopperEgg - http://copperegg.com/aws/
But my favorite is Datadog - http://www.datadoghq.com/ - (not just because I work here, but its important to disclose I do work for Datadog.) 5 hosts or less is free and I bet you can be up and running in less than 5 minutes.
Depends what your requirements are for service availability of the monitoring solution itself, as well as how you want to be alerted about host/service notifications.
Nagios, Icinga etc... will allow you to customise an extremely large number of parameters that can be passed to your EC2 hosts, specifying exactly what you want to monitor or check up on. You can run any of the default (or custom) scripts which then feed data back to a central system, then handle those notifications however you want (i.e. send an email, SMS, execute an arbitrary script). Downside of this approach is that you need to self-manage your backend for all of the aggregated monitoring data.
The CloudWatch approach means your instances can push metric data into AWS, then define custom policies around thresholds. For example, 90% CPU usage for more than 5 minutes on an instance or ASG, which might then push a message out to your email via SNS (Simple Notification Service). This method reduces the amount of backend components to manage/maintain, but lacks the extreme customisation abilities of self-hosted monitoring platforms.