RRDTOOL - RPN-Expression help, how to reference input in COMPUTE DS (add two DS values or the LAST DS value with a new number)?

RRDTOOL - RPN-Expression help, how to reference input in COMPUTE DS (add two DS values or the LAST DS value with a new number)? - rrdtool

I have a data feed that has a single value that increases over time until a forced wrap-around.
I have the wrap-around under control.
The value from the data feed I pass into a RRD GAUGE as ds1.
I want to add a couple data sources to handle exceptions where on a certain condition detected by my script (that calls rrdupdate) to add some details for reporting.
When the condition is true in the script, I want to update the RRD with:
the normal value into ds1
the difference of the prior value to the current value to be marked as batch exceptions into ds2
count (sum) all ds2 values in a similar way to ds1.
I've been playing with the below but wonder if there is a method using COMPUTE or do I need to code all the logic into the bash script to poll rrdinfo, fetch the last_ds lines and prep the data accordingly? Does the rrd COMPUTE type have the ability to read other DS's?
If ds2.value > 0 then set ds3.value to (ds3.last_ds + ds2.value) ?
I looked at the rpn-expression and found it references 'input' but does not show how to feed those inputs into the COMPUTE operation?
eg:
Currently state
DS:ds1:GAUGE:28800:0:U
DS:ds2:COUNTER:1800:0:U
DS:ds3:GAUGE:1800:0:U
RRA:LAST:0.99999:1:72000
RRA:LAST:0.99999:4:17568
RRA:LAST:0.99999:8:18000
RRA:LAST:0.99999:32:4500
RRA:LAST:0.99999:96:1825
Desired state?
DS:ds1:GAUGE:28800:0:U
DS:ds2:COUNTER:1800:0:U
DS:ds3:COMPUTE:1800:0:U
DS:cs1:COMPUTE:input,0,GT,ds3,ds2,+,input,IF <-- what is 'input' is it passed via rrdupdate cs1:[value]?
RRA:LAST:0.99999:1:72000
RRA:LAST:0.99999:4:17568
RRA:LAST:0.99999:8:18000
RRA:LAST:0.99999:32:4500
RRA:LAST:0.99999:96:1825
Alternatively ds1 could have store the total without the exceptions and I could use an AREA and a STACK to plot the total.
If someone is knowledgeable of rpn-expressions when used with rrd it would be a massive help to clarity the rpn-express input reference & what is possible. There is very limited info online about this. If the script has to poll the RRD files for last_ds and do the calculations that is fine just it RRA has the smarts in the COMPUTE DS type, I'd rather use them.
Thank you.

A COMPUTE type datasource needs to have an RPN formula that completely describes it in terms of the other (non-compute) datasources. So, you cannot have multiple definitions of the same source, nor can it populate until the last of the other DS for that time window have been populated.
So, for example, if you have datasources a and b, and you want a COMPUTE type datasource that is equal to a if b<0, and a+b otherwise, you can use
DS:a:COUNTER:1800:0:U
DS:b:GAUGE:1800:0:U
DS:c:COMPUTE:b,0,GT,a,b,+,a,IF
From this, you can see how the definition of c uses RPN to define a single value using the values of a and b (and a constant). The calculation is performed solely within the configured time interval, and subsequently all three are stored and aggregated in the defined RRAs in the same way. You can also then use the graphs functions over c exactly as you would for a or b; the compute function is used only at data storing time.
Here is a full working example for the benefit of the original poster:
rrdtool create test.rrd --step 1800 \
DS:a:COUNTER:28800:0:U \
DS:b:COUNTER:28000:0:U \
DS:c:GAUGE:3600:0:U \
DS:d:COUNTER:3600:0:U \
DS:x:COMPUTE:b,0,GT,a,b,+,a,IF \
RRA:LAST:0.99999:1:72000 \
RRA:LAST:0.99999:4:17568 \
RRA:LAST:0.99999:8:18000 \
RRA:LAST:0.99999:32:4500 \
RRA:LAST:0.99999:96:1825

Related

How does rrdtool RRDB associate/bind an RRA to a DS?

How does rrdtool RRDB associate/bind an RRA to a DS? An XML dump does not seem to reveal where this binding info is kept. Neither does rrdinfo. But this info must be in there, because multiple RRAs can be associated with a single DS. Perhaps Am I missing something?

Every DS is in every RRA. You do not need to bind a specific DS to specific RRAs as the vector created from the set of DS is common to all.
The difference between RRAs is not that they have a different DS vector, but that they have different lengths and granularities, and different roll-up functions. This enables the RRA to pre-calculate summary data at storage time, so that at graph time, most of the work is already done, speeding up the process.

Differences of using the rrdtool RRD PDP or RRA consolidation function to calculate the average reading?

I'm updating a rrdtool round-robin database with 1 minute intervals. I want to store the average value of five updates as one RRA entry in rrdtool RRD. One way to do this is like this:
$ rrdtool create foo.rrd --start 1000000500 --step 60 \
> DS:ping:GAUGE:120:0:1000 RRA:AVERAGE:0.5:5:12; \
> rrdtool update foo.rrd 1000000560:10 1000000620:20 \
> 1000000680:30 1000000740:40 1000000800:50
It accumulates five readings and stores the average of those as an entry in RRA. However, one could achieve the same with this:
$ rrdtool create bar.rrd --start 1000000500 --step 300 \
> DS:ping:GAUGE:600:0:1000 RRA:AVERAGE:0.5:1:12; \
> rrdtool update bar.rrd 1000000560:10 1000000620:20 \
> 1000000680:30 1000000740:40 1000000800:50
As seen above, step is 300 seconds, but as RRD PDP accepts values between the intervals and calculates the average, then both examples store 30((10+20+30+40+50)/5) in RRA. One difference, which I can tell, is that first example requires at least three updates to store an entry to RRA while in case of the second example, a single update within 300 second step is enough. Are there any other differences?

These two examples are not really the same thing under the covers, though they can appear the same in some circumstances.
In the first, you have a 60s step, and your RRA stores the average of 5 PDPs in each CDP.
In the second, you have a 300s step, and your RRA stores each PDP as a CDP.
Here are some differences:
In the first, you will need at least one sample (PDP) every 2 minutes; so three to cover each CDP in the RRA. In the second, you need a single sample every CDP.
In the first, data normalisation of each sample happens over a 60s window. In the second, it happens over a 300s window. This will make things look different when the samples arrive irregularly.
In the first, you can have up to 120s with no data before you get an Unknown; in the second, up to 600s.
While the RRA outcome is pretty much the same in both, which you choose would depend on the nature of your incoming data (how often you get samples, how irregular these are), and your storage and display requirements (if you need higher granularity stored or displayed). The first option is more accurate if you have frequent samples; the second is less storage and less work but may sacrifice some data if you have updates more frequent than the step.
Note that, if you have other RRA types than just AVG, having a smaller step will make calculations more accurate.
In general, I would recommend that you set the step to be close to the expected average sample frequency,with a latency setting according to how regular the data are. Set your RRA consolodation depending on how you need to view and display the data, and how long you are required to hold history for. Create RRAs corresponding to normal display rollups, in order to minimise the amount of on-the-fly calculations.

How to put a 100% TICK or VRULE for unknown data with RRDTool

I'm plotting a number of data on various graphs using RRDTool, occasionally I get unknown data points, this is totally expected especially if the computer updating the RRDs is offline.
That's cool, however, when this happens, I want there to be a nice big red line (for each and every unkonwn so it makes the graph's viewer very aware that the value at those points is not 0, but actually UNKNOWN.
What I have:
What I want (Photoshopped):
Is there an easy/elegant way to accomplish this?

Here's what worked:
I used the CDEF with an existing Data Source (DS) instead of having to create a new DS.
I added the following 2 lines in my RRDTool Graph section
'CDEF:up=a1,0,*,0,EQ,0,1,IF' \
'TICK:up#DB0865:1.0' \
The CDEF doe the calculation of:
a1 * 0
Then compares the the result of that to 0. If they're equal, set "up" to "0" else set "up" to "1".
The only time that they would not be equal, would be if "a1" was unknown.
Therefore when there is a gap in the graph (no data), it will have a 100% vertical bar (TICK) of a deep purple/pink colour (#DB0865)
Even though the documentation on the RRDTool site Indicates that a DS can be added to an existing RRD, it actually cannot be (according to Tobi Oetiker). So I went with the above method to avoid losing all the data in the rrds that I already have when creating a new rrd with a new DS.
Here's an example of how it looks:

The elegant way would be to check if load includes any reasonable value. If not add 1 to DS that you create for this purpose.
So for Robin Database add new DS that will have value 0 or 1
DS:somestatus1:GAUGE:600:U:U
and then start adding 0 or 1 to this DS if your primary DS is not available
at the end for drawing the graph:
DEF:somestatus1=$RRD_FILE:somestatus1:AVERAGE \
CDEF:my_status_cdef=somestatus1,1,0,IF \
TICK:my_status_cdef#e0ffe0:1.0:"Device was ON\n" \
each TICK will draw 100% height vertical bar over the graph as you need
Another option is to create conditional CDEF that will create TICK if primary DS is none.

This method plots an area when "offline". The CDEF checks if the load measurement is UN (unknown), if it is, it will return 1, multiply by INF to make it to the highest value of the plot.
CDEF:offline=load,UN,INF,* \
AREA:offline#FF000011: \

RRD graphs in Zenoss showing NaN on large time ranges

I am trying to create COMMAND JSON datasource to monitor some values, for example from such script:
print json.dumps({
'values': {
'': {'random': random()},
},
'events': []
})
And when i just starting zencommand, appropriate rrd file is created, but cur, avg and max values on graph shows me NaN. That NaNs is replaced by actual numbers when I zoom in to a current point in time, which is not very far from start of monitoring.
Why it don't show correct min, max and avg values before I zoom in? Is that somehow related to consolidation? I read http://www.vandenbogaerdt.nl/rrdtool/min-avg-max.php, but that page don't tell anything about NaN values.
And is any way to quicker zoom in to the current timestamp to see some data faster?

When you are zoomed out, you'll be looking at the lower-granularity RRAs (Round Robin Archives). These do not get populated until enough data are in the higher-granularity ones; so, for example, if you have a 5min-granularity RRA, a 1hr-granularity RRA, and a 1day-granularity RRA, and have collected data for the last 45min, then you will see ~8 data points in your 'daily' graph (which uses the 5min RRA), but nothing in your 'monthly' (which will use the 1hr RRA) or your 'yearly' (which uses the 1day RRA).
This applies to any RRA; AVG, LAST, MAX, etc. Until the consolidated time window is complete, and the full complement of Primary Data Points has been collected for consolidation, the consolidated data point value is undefined.
RRDTool picks the RRA to use based on the requested graph data width and pixel width, as well as the requested consolidation functions. Although there are ways to force RRDtool to use a higher-granularity RRA than it needs to, and to consolidate on the fly, this is inefficient and slow. It also makes having the lower-granularity RRA pointless and throws away one of the major benefits of RRDtool (that it performs consolidation at update time making graphing faster)

rrdtool Holt-Winters feature

I mainly write because I'm using the rrdtool holt-winters feature, but sadly it does not work as I would, starting I'll write for you the rrd file command line creation:
`/usr/bin/rrdtool create /home/spread/httphw/rrd/httpr.rrd --start $_[7]-60 --step 60 DS:200:GAUGE:120:U:U RRA:AVERAGE:0.5:1:1440 RRA:HWPREDICT:1440:0.1:0.0035:288 RRA:AVERAGE:0.5:6:700 RRA:AVERAGE:0.5:24:775 RRA:AVERAGE:0.5:288:797`;
After that I basically insert data and then I draw the graph like that:
`/usr/bin/rrdtool graph file.png --start $start --end $time --width 600 --height 200 --imgformat PNG DEF:doscents=$rrd:200:AVERAGE DEF:pred=$rrd:200:HWPREDICT DEF:dev=$rrd:200:DEVPREDICT DEF:fail=$rrd:200:FAILURES TICK:fail#ffffa0:1.0:"Failures Average" CDEF:scale200=doscents,8,* CDEF:upper=pred,dev,2,*,+ CDEF:lower=pred,dev,2,*,- CDEF:scaledupper=upper,8,* CDEF:scaledlower=lower,8,* LINE1:scale200#0000ff:"Average" LINE1:scaledupper#ff0000:"Upper Bound Average" LINE1:scaledlower#ff0000:"Lower Bound Average"`;
Here's the image RRDTOOL IMAGE
The I get a graph like that, but as you can see there's yellow lines that indicates that there has been an error when that's not true, I mean, the activity line at that point is slightly out from the red area but it does not an error, I basically need to understand the values I gotta set up and based on what, I tried it out but I don't really understand the system really well.
Any sugestion from an rrdtool expert?
Many thanks in advance

Being outside the expected range is an error, as far as Holt-Winters is concerned.

The Holt-Winters FAILURES RRA is a slightly more complex than just 'outside the range HWPREDICT+-2*DEVPREDICT'. In fact, there are the additional threshold and window parameters, which (if not specified, as in your case) default to 7 and 9 respectively.
These cause a smoothing of the samples over window samples before comparison, and only trigger a FAILURE flag when there is a sequence of threshold consecutive errors.
As a result, you see a FAILURE trigger where you do, and not in the larger area to the left (which averages down within the range). This results in a better indicator of consistently our of range behaviour, rather than a slope slightly too early or a temporary spike.
If you want to avoid this, and have a FAILURE flag every time the data goes outside of the predicted bounds, then set the FAILURE parameters to 1 and 1. To do this, you would need to explicitly define the additional HW RRAs rather than having them defined implicitly as you do now.
On a separate note, is is bad practice to have a DS with a purely numerical name. It can cause confusion in the RPN calculations. Always have a DS name start with a lowercase letter.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js