What's the difference between the "number of time series series violates" and the other conditional triggers? I can imagine what would the other conditional trigger would do easily, but I have no idea what would the this one would do.
How I would interpret "number of time series series violates" two ways .
for example A : I can have 5 vm instances , the conditional trigger for "number of time series series violates" at-least 3 times(one instance becomes absent 3 times) , and "is absent" for a 1 day.
or Example B : I can have 5 vm instances , the conditional trigger for "number of time series series violates" at-least 3 vm instances would have to exceed the threshold(become absent) , and "is absent" for a 1 day.
Thank you in advance for clarifying my misunderstanding .
The Example B is correct, let's assume you have a condition for VM Instances and CPU Usage. You have 5 VM so you have 5 different time series - one for each VM. When you set Number of time series violates that means 3 time series has exceed threshold or you can say 3 out of 5 of the time series are in violation. Alternatively you can use the percentage option and set it to 60% which will yield same result.
Setting longer time frame will give you the behaviour you are describing in Example A
Related
Introduction
We are trying to "measure" the cost of usage of a specific use case on one of our Aurora DBs that is not used very often (we use it for staging).
Yesterday at 18:18 hrs. UTC we issued some representative queries to it and today we were examining the resulting graphs via Amazon CloudWatch Insights.
Since we are being billed USD 0.22 per million read/write IOs, we need to know how many of those there were during our little experiment yesterday.
A complicating factor is that in the cost explorer it is not possible to group the final billed costs for read/write IOs per DB instance! Therefore, the only thing we can think of to estimate the cost is from the read/write volume IO graphs on CLoudwatch Insights.
So we went to the CloudWatch Insights and selected the graphs for read/write IOs. Then we selected the period of time in which we did our experiment. Finaly, we examined the graphs with different options: "Number" and "Lines".
Graph with "number"
This shows us the picture below suggesting a total billable IO count of 266+510=776. Since we have choosen the "Sum" metric, this we assume would indicate a cost of about USD 0.00017 in total.
Graph with "lines"
However, if we choose the "Lines" option, then we see another picture, with 5 points on the line. The first and last around 500 (for read IOs) and the last one at approx. 750. Suggesting a total of 5000 read/write IOs.
Our question
We are not really sure which interpretation to go with and the difference is significant.
So our question is now: How much did our little experiment cost us and, equivalently, how to interpret these graphs?
Edit:
Using 5 minute intervals (as suggested in the comments) we get (see below) a horizontal line with points at 255 (read IOs) for a whole hour around the time we did our experiment. But the experiment took less than 1 minute at 19:18 (UTC).
Wil the (read) billing be for 12 * 255 IOs or 255 ... (or something else altogether)?
Note: This question triggered another follow-up question created here: AWS CloudWatch insights graph — read volume IOs are up much longer than actual reading
From Aurora RDS documentation
VolumeReadIOPs
The number of billed read I/O operations from a cluster volume within
a 5-minute interval.
Billed read operations are calculated at the cluster volume level,
aggregated from all instances in the Aurora DB cluster, and then
reported at 5-minute intervals. The value is calculated by taking the
value of the Read operations metric over a 5-minute period. You can
determine the amount of billed read operations per second by taking
the value of the Billed read operations metric and dividing by 300
seconds. For example, if the Billed read operations returns 13,686,
then the billed read operations per second is 45 (13,686 / 300 =
45.62).
You accrue billed read operations for queries that request database
pages that aren't in the buffer cache and must be loaded from storage.
You might see spikes in billed read operations as query results are
read from storage and then loaded into the buffer cache.
Imagine AWS report these data each 5 minutes
[100,150,200,70,140,10]
And you used the Sum of 15 minutes statistic like what you had on the image
F̶i̶r̶s̶t̶,̶ ̶t̶h̶e̶ ̶"̶n̶u̶m̶b̶e̶r̶"̶ ̶v̶i̶s̶u̶a̶l̶i̶z̶a̶t̶i̶o̶n̶ ̶r̶e̶p̶r̶e̶s̶e̶n̶t̶ ̶o̶n̶l̶y̶ ̶t̶h̶e̶ ̶l̶a̶s̶t̶ ̶a̶g̶g̶r̶e̶g̶a̶t̶e̶d̶ ̶g̶r̶o̶u̶p̶.̶ ̶I̶n̶ ̶y̶o̶u̶r̶ ̶c̶a̶s̶e̶ ̶o̶f̶ ̶1̶5̶ ̶m̶i̶n̶u̶t̶e̶s̶ ̶a̶g̶g̶r̶e̶g̶a̶t̶i̶o̶n̶,̶ ̶i̶t̶ ̶w̶o̶u̶l̶d̶ ̶b̶e̶ ̶(̶7̶0̶+̶1̶4̶0̶+̶1̶0̶)̶
Edit: First, the "number" visualization represent the whole selected duration, aggregated with would be the total of (100+150+200+70+140+10)
The "line" visualization will represent all the aggregated groups. which would in this case be 2 points (100+150+200) and (70+140+10)
It can be a little bit hard to understand at first if you are not used to data points and aggregations. So I suggest that you set your "line" chart to Sum of 5 minutes you will need to get value of each points and devide by 300 as suggested by the doc then sum them all
Added images for easier visualization
can I configure for “retry time cycle” many different interval expression for example
something like that: “R6/PT10S, R2/PT30M” - 6 times each 10 Seconds and then 2 times after 30 minutes
Thanks in Advance,
Wladi
The job executor section of the camunda user guide only shows an example of comma separated intervals without repeats.
Looking at the code it seems a repeat is only recognized if there is a single interval configured
https://github.com/camunda/camunda-bpm-platform/blob/7.13.0/engine/src/main/java/org/camunda/bpm/engine/impl/util/ParseUtil.java#L88
The lookup of which interval to apply also does not consider any repeats
https://github.com/camunda/camunda-bpm-platform/blob/7.13.0/engine/src/main/java/org/camunda/bpm/engine/impl/cmd/DefaultJobRetryCmd.java#L113
This sounds like a useful feature though, you might want to open an issue in the camunda issue tracker.
What's the difference between the two options "Any time series violates" and "All time series violate"? I can imagine what would the former one do easily, but I have no idea what would the latter one do.
All time series? how long is its range? and why does it have a for option?
What's the difference between the two options "Any time series violates" and "All time series violate"? I can imagine what would the former one do easily, but I have no idea what would the latter one do.
First, what is "time series violates" - its when CURRENT VALUE of metric is outside of expected range, e.g: above the threshold specified.
Second, "any/all/percent/number" - let's say you have 5 time series, e.g.: cpu usage on 5 instances, then per dropdown options the whole alert condition will violate when:
"any time series": any 1 of the time series is in violation
"all time series": all 5 of the time series are in violation
"percent of time series" (40%): 2 out of 5 of the time series are in violation, and yes, selecting 39% or 41% on small numbers will give you different results, so
"number of time series" (3): 3 out of 5 of the time series are in violation
Third, for aka Duration box, - it looks like "if my time series violates FOR 5 minutes, then violate the condition". And for some simpler alerts this can even work, but once you try to combine it with say, "metric is absent" or other complicated configuration, you will see that what actually happens is "wait for 5 minutes after the problem is there, and only then trigger the violation".
In practice, the use of for field is discouraged and its better to keep it on default "Most recent value".
If you do need the "cpu usage is above 90% for 5 minutes", then correct way of doing it is by denoizing/smoothing your data:
set alignment period to 5 minutes (or whatever is the sliding window that you want)
then choose reasonable aligner (like, mean which will average the values)
and then while the chart will have less datapoints, they would be less noisy and you can act upon the latest value.
Any time will trigger if there is a violation of any time series, inside the window chosen on "for".
Let's say there is 5 time series, it will trigger if there is a violation in one of them.
For the all time series, it will trigger if it happens 5 out of 5.
I'm new to linear programming and trying to develop an ILP model around a problem I'm trying to solve.
My problem is analogous to a machine resource scheduling problem. I have a set of binary variables to represent paired-combinations of machines with a discrete-time grid. Job A takes 1 hour, Job B takes 1 hr and 15 minutes, so the time grid should be in 15 minute intervals. Therefore Job A would use 4 time units, and Job B would use 5 time units.
I'm having difficulty figuring out how to express a constraint such that when a job is assigned to a machine, the units it occupies are sequential in the time variable. Is there an example of how to model this constraint? I'm using PuLP if it helps.
Thanks!
You want to implement the constraint:
x(t-1) = 0 and x(t) = 1 ==> x(t)+...+x(t+n-1) = n
One way is:
x(t)+...+x(t+n-1) >= n*(x(t)-x(t-1))
Notes:
you need to repeat this constraint for each t.
A slightly better version is:
x(t+1)+...+x(t+n-1) >= (n-1)*(x(t)-x(t-1))
There is also a disaggregated version of this constraint that may help performance (depending on the solver: some solvers can do this disaggregation automatically).
Things can become interesting near the beginning and end of the planning period. E.g. machine started at t=-1.
Update:
A different approach is just to limit the "start" of a job to 1. I.e. allow only the combination
x(j,t-1) = 0 and x(j,t) = 1
for a given job j. This can be handled in a similar way:
start(j,t) >= x(j,t)-x(j,t-1)
sum(t, start(j,t)) <= 1
0 <= start(j,t) <= 1
I have defined the following policies on t2.micro instance:
Take action A whenever {maximum} of CPU Utilization is >= 80% for at least 2 consecutive period(s) of 1 minute.
Take action B whenever {Minimum} of CPU Utilization is <= 20% for at least 2 consecutive period(s) of 1 minute.
Is my interpretation is wrong that: if the min (max) of CPU drops below (goes beyond) 20 (80) for 2 minutes, these rules have to be activated?
Because my collected stats show for example the Max of cpu has reached 90% twice in two consecutive period of 1 minute, but I got No Alarm!
Cheers
It seems my interpretation is not correct! The policy works based on the Average of the metric for every minute! It means the first policy will be triggered if the AVERAGE of stat Datapoints within a minute is >= 80% for two consecutive periods of 1 minute. The reason is simple: Cloudwatch does not consider stat datapoints less than 1 Min granularity. So if I go for 5 Minutes period, Max and Min show the correct behavior.