Difficulty understanding AWS cloudwatch alarms

Difficulty understanding AWS cloudwatch alarms - amazon-web-services

I am going through the documentation of CloudWatch alarms https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html. There are example scenario tables in the Configuring How CloudWatch Alarms Treat Missing Data section. I am having difficulty understanding what is going on.
In the two last rows why behaviour of Missing and Ignore column is different?

First of all, the last 2 rows are still very different. Although they both have 2 missing data points, the very last row's last data point is a 'X', which is Breaching/bad while the second last rows's last data point is a 'O' which is OK/good. Under the setting of treating missing data as "MISSING"/"IGNORE", the second last row is considered an OK, even if it is missing 2 data points. It is reasonable that MISSING/IGNORE settings are more permissive than BREACHING.
And in the last row, MISSING/IGNORE also have different behaviors. This is because IGNORE is more permissive than MISSING as you can see the IGNORE will "Retain current state". This means your alarm under that circumstance will just stay as is until new data points come in so that it break the current data point pattern.
And the rationale behind the behavior of MISSING in last row is that, although we see a single bad data point, we need more data point to determine the next alarm state to be good/bad, or INSUFFICIENT, if no more data points.

Related

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet

XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)

You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances

This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

GCP incident wont resolve

I have a service which I want to know how many errors he throws.
So I've created a metric and an alert based on that metric.
The metric is a counter, and it filters out all the unneeded logs, leaving only the relevant onces.
The alert is using the metric, with an aggregator of type 'count' and aligner of type 'delta' resulting in value '1' when the metric catches any errors.
The condition for the alert is to check if the most recent value is above 0.99.
After an incident from that alert has been fired, it just wont close.
I went to the summary page and it shows that for some reason the condition is still being met (atleast that is what I understand from the red lines that keeps increasing) even though the errors when thrown last time a few hours ago.
In the picture you can see the red lines which indicates the duration of the incident, and below it in the graph you can see three small points where an error was detected. The first one caused the incident to fire.
Any help on how to make the incident resolve?
Thanks!

Was able to fix the problem as soon as I set the aggregator to 'sum' instead of 'count'.

Update in data warehouse fact table

Reading upon many Kimball design tips regarding fact tables (transaction, accumulating, periodic) etc. I'm still vague what should I do with my case of updating a fact table which I believe is not that uncommon. To the case.
We're processing complaints from clients, and we want to be able to reflect current status of complaint in the Data Warehouse. Our complaints have a workflow of statuses they go through, different assignees that deal with them on time, but for our analysis this is irrelevant as of now. We would like to review what the current situation on complaint is.
To my understanding the grain of the fact table would be single complaint, with columns (irrelevant for this question whether it should be junk dimension, degenerate etc) such as:
Complaint Number
Current Status
Current Status Date
Current Assignee
Type of complaint
As far as I understand, since we don't want to view the process history, but instead see what the current status of the process is, storing multiple rows for each complaint representing it's state is an overkill, so instead we store only one row per complaint and update it.
Now, is my reasoning correct to do that? In above case, complaint number and type of complaint store values that don't change, while "Current" columns do and we need to update the row, so we could implement Change Data Capture mechanism (just like we do for dimensions right now) to compare incoming rows from source system for this fact with currently stored fact rows to improve time cost of such operation.
It honestly looks like a Dimension table with mixed SCD Type 0&1 for me, but it stores facts of receiving complaints.
SO Post for reference: Fact table with information that is regularly updatable in source system
Edit
I'm aware that I could use accumulating fact table with time stamps which is somewhat SCD Type 2 alike but the end user doesn't really care about the history of the process. There are more facts involved in the analysis later on, so separating this need from data warehouse doesn't really work in this case.

I’ve encountered similar use cases in the past, where an accumulating snapshot would be the default solution.
However, the accumulating snapshot doesn’t allow processes with varying length. I’ve designed a different pattern, when 2 rows are added for each event: if an object goes from state A to state B you first insert a row with state A and quantity -1, then a new one with state B and quantity +1.
The end result allows:
- no updates necessary, only inserts;
- map-reduce friendly;
- arbitrary length processes;
- counting how many of each in each state at any point in time (with the help of a periodic snapshot for performance reasons);
- how many entered or left any state at any point in time.;
- calculate time in each state and age overall.
Details in 5 blog posts here (with implementation in Pentaho Data Integration):
http://ubiquis.co.uk/dwh/status-change-fact-table-part-1-the-problem/

Why does AWS Cloudwatch use an Evaluation Range when determining alarm state with missing data points?

From the docs:
No matter what value you set for how to treat missing data, when an alarm evaluates whether to change state, CloudWatch attempts to retrieve a higher number of data points than specified by Evaluation Periods. The exact number of data points it attempts to retrieve depends on the length of the alarm period and whether it is based on a metric with standard resolution or high resolution. The timeframe of the data points that it attempts to retrieve is the evaluation range.
The docs go on to give an example of an alarm with 'EvaluationPeriods' and 'DatapointsToAlarm' set to 3. They state that Cloudwatch chooses the 5 most recent datapoints. Part of my question is, Where are they getting 5? It's not clear from the docs.
The second part of my question is, why have this behavior at all (or at least, why have it by default)? If I set my evaluation period to 3, my Datapoints to Alarm to 3, and tell Cloudwatch to 'TreatMissingData' as 'breaching,' I'm going to expect 3 periods of missing data to trigger an alarm state. This doesn't necessarily happen, as illustrated by an example in the docs.

I had the same questions. As best I can tell, the 5 can be explained if I am thinking about standard collection intervals vs standard resolution correctly. In other words, if we assume a standard collection interval of 5 minutes and a standard 1-minute resolution, then within the 5 minutes of the collection interval, 5 separate data points are collected. The example states you need 3 data points over 3 evaluation periods, which is less than the 5 data points CloudWatch has collected. CloudWatch would then have all the data points it needs within the 5-data-point evaluation range defined by a single collection. As an example, if 4 of the 5 expected data points are missing from the collection, you have one defined data point and thus need 2 more within the evaluation range to reach the three needed for alarm evaluation. These 2 (not the 4 that are actually missing from the collection) are considered the "missing" data points in the documentation - I find this confusing. The tables in the AWS documentation provide examples for how the different treatments of the "missing" 2 data points impact the alarm evaluations.
Regardless of whether this is the correct interpretation, this could be better explained in the documentation.

I also agree that this behavior is unexpected, and the fact that you can't configure it is quite frustrating. However, there does seem to be an easy workaround depending on your use case.
I also wanted the same behavior as you specified; i.e. a missing data point is a breaching data point plain and simple:
If I set my evaluation period to 3, my Datapoints to Alarm to 3, and tell Cloudwatch to 'TreatMissingData' as 'breaching,' I'm going to expect 3 periods of missing data to trigger an alarm state.
I had a use case which is basically like a push-style health monitor. We needed a particular on-premises service to report a "healthy" metric daily to CloudWatch, and an alarm in case this report didn't come through due to network issues or anything disruptive. Semantically, missing data is the same as reporting a metric of value 0 (the "healthy" metric is value 1).
So I was able to use metric math's FILL function to replace every missing data point with 0. Setting a 1-out-of-1, alarm on <1 alarm on this new expression provides exactly the needed behavior without involving any kind of "missing data".

Logical Programming Problem

I've been trying to solve this problem for quite sometime but I am having trouble with it.
Let's say on a trigger, you receive values.
First trigger: You get 1
Second trigger: You get 1, 2
Third trigger: You get 1, 2, 3
So, I store 1.
For 2nd trigger, I store 2 since 1 already exist.
For 3rd trigger, I store 3 since 1,2 already exist
so in total I have stored 1,2,3
As you can see, we can easily check for new values, if old != new.
Here's come the problem:
Fourth trigger: You get 1, 2, 4
For 4th trigger, I store 1, 2 because it exists
but how do I check against 3 and remove 3 from store and check if 4 is new?
If you are having problems understanding this, feel free to clarify. Thanks!

Use a std::set<int> container. When a trigger arrives, clear it an insert all the values from trigger. This should be ok if you work with just a few numbers (about ten or so). With more, a little bit more sophisticated approach might be required.

Hard to tell what you're asking exactly, but see std::set data structure if your main problem is trying to maintain a set of unique numbers and efficiently check for existence in the set.

Your logic changed between 1,2,3 and 1,2,4
(only stored 3 on former, but stored 1,2,4 on latter)
In that case, ignore data recv'd that already exists, only storing new data, unless some old data was not sent in which case you'll create a new set of data to store.
But, I'm guessing that's not what you had in mind at all :)
edit
I see it's been edited now, so my answer is invalid
edit-2
the fastest way is to drop all stored data on each iteration as comparisons will take as long (if not longer) than a complete save of sent data.

Your approach sounds like it is better served by using some basic set theory. A couple of answers already point you to STL sets for that matter. With that, you'll need to iterate through the reported values to test for membership in the set.
However, is there an opportunity to affect what is reported with each "trigger"? For example, if this is something like a select poll, you could just put whatever it is that you're polling into a different state so that it is not reported as ready in subsequent triggers.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js