How to understand --step option in RRD? - rrdtool

Am I correct that value of --step option is used solely for pre-calculating the data slots in RRD? Or does RRD somehow expect updates with interval specified with --step?

RRDtool will 're-sample' the data you provide to be in --step interval before continuing to process it. You can deliver as many updates as you wish. RRDtool will take them all into account when building the --step interval.

Related

RRDTOOL - RPN-Expression help, how to reference input in COMPUTE DS (add two DS values or the LAST DS value with a new number)?

I have a data feed that has a single value that increases over time until a forced wrap-around.
I have the wrap-around under control.
The value from the data feed I pass into a RRD GAUGE as ds1.
I want to add a couple data sources to handle exceptions where on a certain condition detected by my script (that calls rrdupdate) to add some details for reporting.
When the condition is true in the script, I want to update the RRD with:
the normal value into ds1
the difference of the prior value to the current value to be marked as batch exceptions into ds2
count (sum) all ds2 values in a similar way to ds1.
I've been playing with the below but wonder if there is a method using COMPUTE or do I need to code all the logic into the bash script to poll rrdinfo, fetch the last_ds lines and prep the data accordingly? Does the rrd COMPUTE type have the ability to read other DS's?
If ds2.value > 0 then set ds3.value to (ds3.last_ds + ds2.value) ?
I looked at the rpn-expression and found it references 'input' but does not show how to feed those inputs into the COMPUTE operation?
eg:
Currently state
DS:ds1:GAUGE:28800:0:U
DS:ds2:COUNTER:1800:0:U
DS:ds3:GAUGE:1800:0:U
RRA:LAST:0.99999:1:72000
RRA:LAST:0.99999:4:17568
RRA:LAST:0.99999:8:18000
RRA:LAST:0.99999:32:4500
RRA:LAST:0.99999:96:1825
Desired state?
DS:ds1:GAUGE:28800:0:U
DS:ds2:COUNTER:1800:0:U
DS:ds3:COMPUTE:1800:0:U
DS:cs1:COMPUTE:input,0,GT,ds3,ds2,+,input,IF <-- what is 'input' is it passed via rrdupdate cs1:[value]?
RRA:LAST:0.99999:1:72000
RRA:LAST:0.99999:4:17568
RRA:LAST:0.99999:8:18000
RRA:LAST:0.99999:32:4500
RRA:LAST:0.99999:96:1825
Alternatively ds1 could have store the total without the exceptions and I could use an AREA and a STACK to plot the total.
If someone is knowledgeable of rpn-expressions when used with rrd it would be a massive help to clarity the rpn-express input reference & what is possible. There is very limited info online about this. If the script has to poll the RRD files for last_ds and do the calculations that is fine just it RRA has the smarts in the COMPUTE DS type, I'd rather use them.
Thank you.
A COMPUTE type datasource needs to have an RPN formula that completely describes it in terms of the other (non-compute) datasources. So, you cannot have multiple definitions of the same source, nor can it populate until the last of the other DS for that time window have been populated.
So, for example, if you have datasources a and b, and you want a COMPUTE type datasource that is equal to a if b<0, and a+b otherwise, you can use
DS:a:COUNTER:1800:0:U
DS:b:GAUGE:1800:0:U
DS:c:COMPUTE:b,0,GT,a,b,+,a,IF
From this, you can see how the definition of c uses RPN to define a single value using the values of a and b (and a constant). The calculation is performed solely within the configured time interval, and subsequently all three are stored and aggregated in the defined RRAs in the same way. You can also then use the graphs functions over c exactly as you would for a or b; the compute function is used only at data storing time.
Here is a full working example for the benefit of the original poster:
rrdtool create test.rrd --step 1800 \
DS:a:COUNTER:28800:0:U \
DS:b:COUNTER:28000:0:U \
DS:c:GAUGE:3600:0:U \
DS:d:COUNTER:3600:0:U \
DS:x:COMPUTE:b,0,GT,a,b,+,a,IF \
RRA:LAST:0.99999:1:72000 \
RRA:LAST:0.99999:4:17568 \
RRA:LAST:0.99999:8:18000 \
RRA:LAST:0.99999:32:4500 \
RRA:LAST:0.99999:96:1825

Differences of using the rrdtool RRD PDP or RRA consolidation function to calculate the average reading?

I'm updating a rrdtool round-robin database with 1 minute intervals. I want to store the average value of five updates as one RRA entry in rrdtool RRD. One way to do this is like this:
$ rrdtool create foo.rrd --start 1000000500 --step 60 \
> DS:ping:GAUGE:120:0:1000 RRA:AVERAGE:0.5:5:12; \
> rrdtool update foo.rrd 1000000560:10 1000000620:20 \
> 1000000680:30 1000000740:40 1000000800:50
It accumulates five readings and stores the average of those as an entry in RRA. However, one could achieve the same with this:
$ rrdtool create bar.rrd --start 1000000500 --step 300 \
> DS:ping:GAUGE:600:0:1000 RRA:AVERAGE:0.5:1:12; \
> rrdtool update bar.rrd 1000000560:10 1000000620:20 \
> 1000000680:30 1000000740:40 1000000800:50
As seen above, step is 300 seconds, but as RRD PDP accepts values between the intervals and calculates the average, then both examples store 30((10+20+30+40+50)/5) in RRA. One difference, which I can tell, is that first example requires at least three updates to store an entry to RRA while in case of the second example, a single update within 300 second step is enough. Are there any other differences?
These two examples are not really the same thing under the covers, though they can appear the same in some circumstances.
In the first, you have a 60s step, and your RRA stores the average of 5 PDPs in each CDP.
In the second, you have a 300s step, and your RRA stores each PDP as a CDP.
Here are some differences:
In the first, you will need at least one sample (PDP) every 2 minutes; so three to cover each CDP in the RRA. In the second, you need a single sample every CDP.
In the first, data normalisation of each sample happens over a 60s window. In the second, it happens over a 300s window. This will make things look different when the samples arrive irregularly.
In the first, you can have up to 120s with no data before you get an Unknown; in the second, up to 600s.
While the RRA outcome is pretty much the same in both, which you choose would depend on the nature of your incoming data (how often you get samples, how irregular these are), and your storage and display requirements (if you need higher granularity stored or displayed). The first option is more accurate if you have frequent samples; the second is less storage and less work but may sacrifice some data if you have updates more frequent than the step.
Note that, if you have other RRA types than just AVG, having a smaller step will make calculations more accurate.
In general, I would recommend that you set the step to be close to the expected average sample frequency,with a latency setting according to how regular the data are. Set your RRA consolodation depending on how you need to view and display the data, and how long you are required to hold history for. Create RRAs corresponding to normal display rollups, in order to minimise the amount of on-the-fly calculations.

rrdtool: Compute 95th percentile of data within a sliding window

I'm using rrdtool to graph data about CPU usage as produced and stored by Munin. Munin (at least for us) stores each data-series in an .rrd file with 12 RRAs: "MIN", "MAX", and "AVERAGE" over each of the four periods "last 2d in 5m intervals", "last 9d in 30m intervals", "last 270d in 12h intervals", and "last 177y in 144d intervals".
I already know how to use rrdtool graph to produce a trend line indicating where my average CPU usage is going. (For simplicity, we can pretend I'm on a single-CPU system; in real life I have more code to deal with that.)
rrdtool graph /tmp/foo.png \
--start -12w --end +24w \
--lower-limit 0 --upper-limit 100 --rigid \
--title 'cpu usage' --width 620 --height 200 --border 0 \
--vertical-label 'cpu usage' \
DEF:idle=/var/lib/munin/mybox/mybox-cpu-idle-d.rrd:42:AVERAGE \
DEF:iowait=/var/lib/munin/mybox/mybox-cpu-iowait-d.rrd:42:AVERAGE \
CDEF:percent_used=100,idle,-,iowait,- \
AREA:percent_used#00880077:'cpu usage' \
VDEF:fit_m=percent_used,LSLSLOPE \
VDEF:fit_b=percent_used,LSLINT \
CDEF:trendline=percent_used,POP,fit_m,COUNT,*,fit_b,+ \
LINE1:trendline#FFBB00:'Trend since 12w ago'
The problem with this graph is that it shows only the average CPU usage trend. But my workload is spiky: usage is very low 90% of the time and then has brief spikes. What I really care about is the trend of the spikes in CPU usage.
So I could run the same command replacing AVERAGE with MAX... but the actual maxes are so randomly distributed (and usually close to 100%) that they don't produce any useful trend line.
So I'm thinking that the graph I actually want would be a graph of the 95th percentile (or maybe just the 75th percentile... ideally I'd be able to adjust the parameter), where that "percentile" is taken over the data in each consecutive 24-hour period.
Conceptually, I want to boil down our last 9 days of data (48 data points per day) into just 9 data points (1 data point per day — representing the Nth percentile of the 48 original points from that day).
And then I'd fit a line to that data using LSLSLOPE and LSLINT and display it on the same graph as the rest of this stuff.
But I can't figure out how to boil down the data in this way, using rrdtool's RPN facilities.
I know that I can use PERCENTNAN to get the scalar number that is the 95th percentile of my whole data-series, but I want a data-series consisting of 9 numbers, not just one scalar.
I know that I can use TRENDNAN to get a data-series that is the mean of a sliding window of my data-series, which would be good enough if only it gave me the median (50th percentile) instead of the mean, and then allowed me to adjust that parameter from "50" up to "95"... but it doesn't.
Alternatively, I know how to use Python to compute the series I want, using rrdtool first and rrdtool fetch, but then there's no simple way to feed that series back into rrdtool to create the graph.
I'm thinking maybe I could extract usage_today, usage_yesterday, usage_2d, usage_3d,... into nine separate series, use PERCENTNAN on them all individually, and then somehow fit a line to that. But that's mostly desperate handwaving; if someone posted an answer that actually made that approach work, I'd accept it.
RRDTool has 95th percentile functionality built in. Note that the accuracy of the percentail calculations will depend on the granularity of the data available in the requested time period, though... so the bigger your 1-pdp RRA is, the better.
So, for example, to get a horizontal line at the 95th percentile, we can use these directives:
DEF:idlehr=/var/lib/munin/mybox/mybox-cpu-idle-d.rrd:42:AVERAGE:step=1
VDEF:pctidle=idlehr,95,PERCENTNAN
HRULE:pctidle#ff0000:95th_Percentile
The step=1 on the end of the DEF ensures that the highest resolution data available will be selected. This may be computationally intensive, if you're graphing for a full year and high resolution data are avaialable for this time window!
The problem is, though, that you want a graph showing a different value for each day -- in effect, a sliding window of percentile calculations, in the same way as TRED and PREDICT work, but with a step of one day. RRDTool cannot do this.
So, the answer is, you can show a graph for one day with a single value percentile for that day. You cannot create a graph with one data point per day, where that data point is calculated as the percentile for that day.
The only way I can think of to achieve this is to repeatedly call rrdtool xport iteratively to calculate the percentile values for a sequence of days, and then use that data to generate a bar graph in another graphing package.

RRDTOOL and custom data

I'm trying to use rrdtool to make some graphs. But it's not working as i wanted...
Here is the situation:
I have a file with data that are collected every 30 seconds but i can access this file only the day after. For example if i want to graph Tuesday data, i have to wait Wednesday morning.
So what i have done is to create a new database with these information:
rrdtool create filename.rrd --step '30' 'DS:t634:GAUGE:60:U:U' 'RRA:AVERAGE:0.5:1:1000'
collected data:
rrdtool update filename.rrd 1390231080:1
rrdtool update filename.rrd 1390231110:2
rrdtool update filename.rrd 1390231140:3
rrdtool update filename.rrd 1390231170:4
....
generated a graph:
rrdtool graph 'graph.png' --width '400' --height '100' 'DEF:T634=filename.rrd:t634:AVERAGE' 'LINE1:T634#0000FF:T634'
I have a graph with no line on it...
Is my rrd file creation false?
Thanks in advance for your help!
Your 'rrdtool graph' call does not specify a start and end time for the graph. The default is a 1day graph from the current time. If the data are historical, the most recent data point may be outside the default graph time window. Specify a start and end time point in your graph request.
You can verify that the data are in the RRD by using an 'rrdtool fetch' request.
I figured out the problem... There was too few data in the file and the graph offset time was too high...
Thanks a lot for your help!

RRDTool database definition and plotting the data -I need a second opinion

Here is what I am trying to achieve:
I read my data once a day (the exact time of the day is not very important).
I want to archive the values for this DS for two years back.
I need to be able to look back for 2 years and I need the value for every day
and I also need to see the weekly average
If I miss a reading for two consecutive days the data should be declared unknown
Here is what I am using for this:
rrdtool create Carsforsale.rrd --start 20130217 --step 86400 ^
DS:MidsizeCars:GAUGE:172800:U:U ^
DS:FullSizeCars:GAUGE:172800:U:U ^
RRA:AVERAGE:0:7:104^
RRA:LAST:0:7:1:720
I updated the above database with
rrdtool update Carsforsale.rrd 1361203200:554:791
rrdtool update Carsforsale.rrd 1361289600:556:795
The updated correspond to yesterday and the day before yesterday (18, 19 Feb)
I tried to plot the graphs for the above using this
rrdtool graph "Inventory.png" \
--start "20130217" \
--imgformat PNG --width 850 --height 400 \
DEF:MidsizeCars=Carsforsale.rrd:MidsizeCars:AVERAGE \
DEF:FullSizeCars=Carsforsale.rrd:FullSizeCars:AVERAGE \
AREA:MidsizeCars#0000FF:"MidsizeCars" \
AREA:FullSizeCars#FF004D:"FullSizeCars:STACK"'
And now here are the my questions:
are the step and the heart beat defined correctly for what I wantto do ?
Why are my graphs empty ?
Looking into the database with the freeware utility called RRD Editor I could see that the last values are stored in the MidSizeCars and FullSizecars but the only DS that contains a history of what has been loaded into the database is the archiving function LAST Am I supposed to plot LAST or Average to see the current values ?
Thanks
C
since you want to keep the data for two years at 1 day resolution, you have to setup an appropriate RRA for this purpose ... since this will only be about 730 values, I would not bother with setting up an extra consolidated RRA for the week. this will get calculated on the fly ...