How to explain rrdtool output - rrdtool

First. I created a rrd database
$ rrdtool create test.rrd --start 1200000000 --step 300 DS:test_1:GAUGE:600:0:100 RRA:AVERAGE:0.5:1:12
Second. do some updates
$ rrdtool update test.rrd 1200000100:1
$ rrdtool update test.rrd 1200000400:3
$ rrdtool update test.rrd 1200000700:4
$ rrdtool update test.rrd 1200001000:5
Third. fetch data from test.rrd
$ rrdtool fetch test.rrd -r 300 -s 1200000000 -e 1200001000 AVERAGE
Why 1200000300 is 2.333?

This is caused by Data Normalisation. RRDTool will automatically adjust data to fit exactly on the time boundary of the defined Interval.
Although your data are spaced exactly at 300s intervals, the same as your defined Interval (step), unfortunately they are not on the actual boundaries.
The boundary is when time modulo step is equal to zero. In your case, that would be at time 1200000000 and not at 1200000100. Thus, the sample needs to be adjusted (one third of it allocated to the earlier interval, and two thirds to the later). This is further complicated because you are operating in Gauge mode whereas RRDTool works interpolates assuming a linear rate of change of the rate.
If you started your samples at time 1200000300 or at 1200000000 then you would see them stored exactly as given, because the normalisation step would become a null operation. Since you provide a Gauge sample at 1200000100 and 1200000400 , the stored value for 1200000300 will be two-thirds along a line joining the two sample: 1 + ( 3 - 1 ) x 0.666 = 2.333 which is what you are getting.
The tutorial by Alex van den Bogeardt here will explain it all to you.

Related

Differences of using the rrdtool RRD PDP or RRA consolidation function to calculate the average reading?

I'm updating a rrdtool round-robin database with 1 minute intervals. I want to store the average value of five updates as one RRA entry in rrdtool RRD. One way to do this is like this:
$ rrdtool create foo.rrd --start 1000000500 --step 60 \
> DS:ping:GAUGE:120:0:1000 RRA:AVERAGE:0.5:5:12; \
> rrdtool update foo.rrd 1000000560:10 1000000620:20 \
> 1000000680:30 1000000740:40 1000000800:50
It accumulates five readings and stores the average of those as an entry in RRA. However, one could achieve the same with this:
$ rrdtool create bar.rrd --start 1000000500 --step 300 \
> DS:ping:GAUGE:600:0:1000 RRA:AVERAGE:0.5:1:12; \
> rrdtool update bar.rrd 1000000560:10 1000000620:20 \
> 1000000680:30 1000000740:40 1000000800:50
As seen above, step is 300 seconds, but as RRD PDP accepts values between the intervals and calculates the average, then both examples store 30((10+20+30+40+50)/5) in RRA. One difference, which I can tell, is that first example requires at least three updates to store an entry to RRA while in case of the second example, a single update within 300 second step is enough. Are there any other differences?
These two examples are not really the same thing under the covers, though they can appear the same in some circumstances.
In the first, you have a 60s step, and your RRA stores the average of 5 PDPs in each CDP.
In the second, you have a 300s step, and your RRA stores each PDP as a CDP.
Here are some differences:
In the first, you will need at least one sample (PDP) every 2 minutes; so three to cover each CDP in the RRA. In the second, you need a single sample every CDP.
In the first, data normalisation of each sample happens over a 60s window. In the second, it happens over a 300s window. This will make things look different when the samples arrive irregularly.
In the first, you can have up to 120s with no data before you get an Unknown; in the second, up to 600s.
While the RRA outcome is pretty much the same in both, which you choose would depend on the nature of your incoming data (how often you get samples, how irregular these are), and your storage and display requirements (if you need higher granularity stored or displayed). The first option is more accurate if you have frequent samples; the second is less storage and less work but may sacrifice some data if you have updates more frequent than the step.
Note that, if you have other RRA types than just AVG, having a smaller step will make calculations more accurate.
In general, I would recommend that you set the step to be close to the expected average sample frequency,with a latency setting according to how regular the data are. Set your RRA consolodation depending on how you need to view and display the data, and how long you are required to hold history for. Create RRAs corresponding to normal display rollups, in order to minimise the amount of on-the-fly calculations.

rrdtool update multiple datasources in two commands

RRD does not update the second datasource correctly, see:
First, I create the RRD file with two datasources (c1 and c2):
rdtool create test.rrd --start N --step 60 DS:c1:GAUGE:120:0:100 DS:c2:GAUGE:120:0:100 RRA:AVERAGE:0.5:1:1440
Then I do update the two datasources in two commands:
rrdtool update test.rrd -t c1 N:10 && rrdtool update test.rrd -t c2 N:10
Wait for 60 seconds....
Do again an update :
rdtool update test.rrd -t c1 N:20 && rrdtool update test.rrd -t c2 N:20
Then lets see what we have:
rrdtool fetch test.rrd AVERAGE | tail -5
1468409580: -nan -nan
1468409640: -nan -nan
1468409700: -nan -nan
1468409760: 1,5988575517e+01 1,9266620475e-01
1468409820: -nan -nan
The first datacource c1 works as expected, but the second c2 shows a value lower than 1 and I expect also a value close to 15.
Yes, I know I can also update both datasources in ONE update command, but in my case a have a lot of datasources in one rrd file and its better to read and follow the mass of values.
Used rrd version : 1.6.0
This is, of course, Data Normalisation. It is also caused by your updating the two datasources in two separate calls.
If you instead use:
rrdtool update test.rrd -t c1:c2 N:10:10
rrdtool update test.rrd -t c1:c2 N:20:20
then you will be updating both DSs at the same time. You see, when you do it in separate updates, what you're actually doing is implicitly updating the other DS with 'unknown' and then relying on the automatic interpolation to fill things in. RRDTool is not a relational database, and you cannot update values in a timewindow independently without affecting the other values.
The other issue is Data Normalisation, where values are adjusted temporally to fit into the exact time boundaries and in doing so, the values are adjusted to be linearly equivalent... the practical upshot when using network traffic (big numbers) is pretty much the same, and the overall totals and averages are consistent, but smaller point-in-time values end up as decimals like this.
So, two things:
Update your DS all together, not in separate calls
Try to update exactly on the time boundary.(Instead of using 'N' use an exact time, rounded to the nearest minute)

RRD tool retrieve maximum value

am trying to get the speed and maximum values for all of my interface using rrdtool using rrdtool fetch ... etc but the max value is for the standard time
ex :
rrdtool fetch xxx.rrd MAX -r 7200 -s 1357041600 -e now
i just need to get the highest maximum value for 1 year
Use rrdtool graph VDEF and PRINT without giving any actual graphing commands. This will return the number you are looking for.

RRDTool database definition and plotting the data -I need a second opinion

Here is what I am trying to achieve:
I read my data once a day (the exact time of the day is not very important).
I want to archive the values for this DS for two years back.
I need to be able to look back for 2 years and I need the value for every day
and I also need to see the weekly average
If I miss a reading for two consecutive days the data should be declared unknown
Here is what I am using for this:
rrdtool create Carsforsale.rrd --start 20130217 --step 86400 ^
DS:MidsizeCars:GAUGE:172800:U:U ^
DS:FullSizeCars:GAUGE:172800:U:U ^
RRA:AVERAGE:0:7:104^
RRA:LAST:0:7:1:720
I updated the above database with
rrdtool update Carsforsale.rrd 1361203200:554:791
rrdtool update Carsforsale.rrd 1361289600:556:795
The updated correspond to yesterday and the day before yesterday (18, 19 Feb)
I tried to plot the graphs for the above using this
rrdtool graph "Inventory.png" \
--start "20130217" \
--imgformat PNG --width 850 --height 400 \
DEF:MidsizeCars=Carsforsale.rrd:MidsizeCars:AVERAGE \
DEF:FullSizeCars=Carsforsale.rrd:FullSizeCars:AVERAGE \
AREA:MidsizeCars#0000FF:"MidsizeCars" \
AREA:FullSizeCars#FF004D:"FullSizeCars:STACK"'
And now here are the my questions:
are the step and the heart beat defined correctly for what I wantto do ?
Why are my graphs empty ?
Looking into the database with the freeware utility called RRD Editor I could see that the last values are stored in the MidSizeCars and FullSizecars but the only DS that contains a history of what has been loaded into the database is the archiving function LAST Am I supposed to plot LAST or Average to see the current values ?
Thanks
C
since you want to keep the data for two years at 1 day resolution, you have to setup an appropriate RRA for this purpose ... since this will only be about 730 values, I would not bother with setting up an extra consolidated RRA for the week. this will get calculated on the fly ...

rrdtool graph slightly different graph

First I will say that even after setting up a system where I register the each minute total of http responses (200, 301, 302, etc) and I'm able to know how is going on the performance speaking about users, it happens to me that my boss is getting me mad with something that i thing it is related with rrd internals, but supossedly i must solve that.
What I do with the rrdtool?:
After a minute (60 seconds) summarizing different http responses i insert the value with the time stamp into the rrd database.
This is the rrd file definition:
/usr/bin/rrdtool create file.rrd --start $_[7]-60 --step 60 DS:200:GAUGE:120:U:U DS:300:GAUGE:120:U:U DS:400:GAUGE:120:U:U DS:404:GAUGE:120:U:U DS:500:GAUGE:120:U:U DS:502:GAUGE:120:U:U DS:504:GAUGE:120:U:U RRA:AVERAGE:0.5:1:43200
As you can see in the RRA i save 43200 which means two week saving 60 seconds values.
The problem it comes when i draw, this is the command I use to draw the graph of the last 6 hours (Where $start is the start time, $time the end time and $rrd the rrd file)
{/usr/bin/rrdtool graph last6hours.png --units=si --alt-y-grid --start $start --end $time -o -S 60 --width 600 --height 200 --imgformat PNG DEF:200=$rrd:200:AVERAGE LINE1:200#006666:"200" DEF:300=$rrd:300:AVERAGE LINE1:300#FF00CC:\"301+302\" DEF:400=$rrd:400:AVERAGE LINE1:400#000000:\"400\" DEF:404=$rrd:404:AVERAGE LINE1:404#6666CC:\"404\" DEF:500=$rrd:500:AVERAGE LINE1:500#00FF66:\"500\" DEF:502=$rrd:502:AVERAGE LINE1:502#FF0000:\"502\" DEF:504=$rrd:504:AVERAGE LINE1:504#FF9900:\"504\";}
And this is the one I use to draw the las 12 hours:
{/usr/bin/rrdtool graph last12hours.png --units=si --alt-y-grid --start $start --end $time -o -S 60 --width 600 --height 200 --imgformat PNG DEF:200=$rrd:200:AVERAGE LINE1:200#006666:"200" DEF:300=$rrd:300:AVERAGE LINE1:300#FF00CC:\"301+302\" DEF:400=$rrd:400:AVERAGE LINE1:400#000000:\"400\" DEF:404=$rrd:404:AVERAGE LINE1:404#6666CC:\"404\" DEF:500=$rrd:500:AVERAGE LINE1:500#00FF66:\"500\" DEF:502=$rrd:502:AVERAGE LINE1:502#FF0000:\"502\" DEF:504=$rrd:504:AVERAGE LINE1:504#FF9900:\"504\";}
And now please look at the draws and see that into the first graph inside the red circle there's a descend of the responses 200 until 0, but into the graph of the last 12 hours the same descend it does not go until 0, so my boss is pressing me saying that the data is not real when it is, but the worst if that i know is real and is about rrdtool internals, but I doon't know how to solve it.
Any subjestion please?
this change is due to the fact that rrdtool is consolidating data, adapting it to the resolution of the chart you are drawing. Your initial chart shows high resolution data while the second chart covers a wider time range and thus shows several data points wrapped into one. consider the following:
original: 10,10,10,0,10,10
consolidated 2 to 1: 10,5,10
If you want to preserve extremes, you should setup a MIN and MAX RRA and use that for charting the extremes.
hth
tobi