rrdtool graph slightly different graph - rrdtool

First I will say that even after setting up a system where I register the each minute total of http responses (200, 301, 302, etc) and I'm able to know how is going on the performance speaking about users, it happens to me that my boss is getting me mad with something that i thing it is related with rrd internals, but supossedly i must solve that.
What I do with the rrdtool?:
After a minute (60 seconds) summarizing different http responses i insert the value with the time stamp into the rrd database.
This is the rrd file definition:
/usr/bin/rrdtool create file.rrd --start $_[7]-60 --step 60 DS:200:GAUGE:120:U:U DS:300:GAUGE:120:U:U DS:400:GAUGE:120:U:U DS:404:GAUGE:120:U:U DS:500:GAUGE:120:U:U DS:502:GAUGE:120:U:U DS:504:GAUGE:120:U:U RRA:AVERAGE:0.5:1:43200
As you can see in the RRA i save 43200 which means two week saving 60 seconds values.
The problem it comes when i draw, this is the command I use to draw the graph of the last 6 hours (Where $start is the start time, $time the end time and $rrd the rrd file)
{/usr/bin/rrdtool graph last6hours.png --units=si --alt-y-grid --start $start --end $time -o -S 60 --width 600 --height 200 --imgformat PNG DEF:200=$rrd:200:AVERAGE LINE1:200#006666:"200" DEF:300=$rrd:300:AVERAGE LINE1:300#FF00CC:\"301+302\" DEF:400=$rrd:400:AVERAGE LINE1:400#000000:\"400\" DEF:404=$rrd:404:AVERAGE LINE1:404#6666CC:\"404\" DEF:500=$rrd:500:AVERAGE LINE1:500#00FF66:\"500\" DEF:502=$rrd:502:AVERAGE LINE1:502#FF0000:\"502\" DEF:504=$rrd:504:AVERAGE LINE1:504#FF9900:\"504\";}
And this is the one I use to draw the las 12 hours:
{/usr/bin/rrdtool graph last12hours.png --units=si --alt-y-grid --start $start --end $time -o -S 60 --width 600 --height 200 --imgformat PNG DEF:200=$rrd:200:AVERAGE LINE1:200#006666:"200" DEF:300=$rrd:300:AVERAGE LINE1:300#FF00CC:\"301+302\" DEF:400=$rrd:400:AVERAGE LINE1:400#000000:\"400\" DEF:404=$rrd:404:AVERAGE LINE1:404#6666CC:\"404\" DEF:500=$rrd:500:AVERAGE LINE1:500#00FF66:\"500\" DEF:502=$rrd:502:AVERAGE LINE1:502#FF0000:\"502\" DEF:504=$rrd:504:AVERAGE LINE1:504#FF9900:\"504\";}
And now please look at the draws and see that into the first graph inside the red circle there's a descend of the responses 200 until 0, but into the graph of the last 12 hours the same descend it does not go until 0, so my boss is pressing me saying that the data is not real when it is, but the worst if that i know is real and is about rrdtool internals, but I doon't know how to solve it.
Any subjestion please?

this change is due to the fact that rrdtool is consolidating data, adapting it to the resolution of the chart you are drawing. Your initial chart shows high resolution data while the second chart covers a wider time range and thus shows several data points wrapped into one. consider the following:
original: 10,10,10,0,10,10
consolidated 2 to 1: 10,5,10
If you want to preserve extremes, you should setup a MIN and MAX RRA and use that for charting the extremes.
hth
tobi

Related

Differences of using the rrdtool RRD PDP or RRA consolidation function to calculate the average reading?

I'm updating a rrdtool round-robin database with 1 minute intervals. I want to store the average value of five updates as one RRA entry in rrdtool RRD. One way to do this is like this:
$ rrdtool create foo.rrd --start 1000000500 --step 60 \
> DS:ping:GAUGE:120:0:1000 RRA:AVERAGE:0.5:5:12; \
> rrdtool update foo.rrd 1000000560:10 1000000620:20 \
> 1000000680:30 1000000740:40 1000000800:50
It accumulates five readings and stores the average of those as an entry in RRA. However, one could achieve the same with this:
$ rrdtool create bar.rrd --start 1000000500 --step 300 \
> DS:ping:GAUGE:600:0:1000 RRA:AVERAGE:0.5:1:12; \
> rrdtool update bar.rrd 1000000560:10 1000000620:20 \
> 1000000680:30 1000000740:40 1000000800:50
As seen above, step is 300 seconds, but as RRD PDP accepts values between the intervals and calculates the average, then both examples store 30((10+20+30+40+50)/5) in RRA. One difference, which I can tell, is that first example requires at least three updates to store an entry to RRA while in case of the second example, a single update within 300 second step is enough. Are there any other differences?
These two examples are not really the same thing under the covers, though they can appear the same in some circumstances.
In the first, you have a 60s step, and your RRA stores the average of 5 PDPs in each CDP.
In the second, you have a 300s step, and your RRA stores each PDP as a CDP.
Here are some differences:
In the first, you will need at least one sample (PDP) every 2 minutes; so three to cover each CDP in the RRA. In the second, you need a single sample every CDP.
In the first, data normalisation of each sample happens over a 60s window. In the second, it happens over a 300s window. This will make things look different when the samples arrive irregularly.
In the first, you can have up to 120s with no data before you get an Unknown; in the second, up to 600s.
While the RRA outcome is pretty much the same in both, which you choose would depend on the nature of your incoming data (how often you get samples, how irregular these are), and your storage and display requirements (if you need higher granularity stored or displayed). The first option is more accurate if you have frequent samples; the second is less storage and less work but may sacrifice some data if you have updates more frequent than the step.
Note that, if you have other RRA types than just AVG, having a smaller step will make calculations more accurate.
In general, I would recommend that you set the step to be close to the expected average sample frequency,with a latency setting according to how regular the data are. Set your RRA consolodation depending on how you need to view and display the data, and how long you are required to hold history for. Create RRAs corresponding to normal display rollups, in order to minimise the amount of on-the-fly calculations.

Rrdtool graphing a digital event without jagged edges

I'm reading my energy meter output to keep track of actual energy used, etc. The energy cost is calculated with a tarif, and that changes during the day from one state to another (1 or 2). When I graph the tarif together with the actual usage over time, the sharp edge of the tarif state gets jagged, possibly caused by the averaging.
I have used DS:tarif:GAUGE:... to setup the db, and I'm using DEF:tarif:xx.rrd:tarif:AVERAGE to graph.
How do I record and graph a "digital" signal with sharp edges?
There are two things that you might be referring to here.
First, make sure you are not using --slope-mode. This option uses slopes rather than steps in the graph; this sound like the wrong option for you.
Next, you have the problem that your tariff DS is varying between two values (lets say a and b), but when you look at the higher-level graphs, this starts to average out and looks wrong.
When looking at your high-granularity (IE, close up) graphs, you have one (or more) pixels per RRA data point, and one sample per RRA data point. So, you'll see either a or b. However, when you move up to a lower granularity (IE, further away) graph, RRDtool will start to have multiple data points per pixel. Depending on your RRD file definition, rrdtool will then either consolodate on the fly, or move to using a different RRA with more samples per datapoint.
So, this means that you have multiple samples per pixel, and they need to somehow be combined. By default, RRDTool will average them, which can result in the jagged behaviour.
However, what do you want to happen? If the time interval corresponding to a single pixel has 2 instances of a and three of b, what should the graph show?
Here are a couple of suggestions on how you might do it.
Use background.
Since your tariff has only two values, you could use this to colour the background - eg, make a red or green background depending on tariff, and then draw your usage graph line over the top of this. Background colour can be done by having an area from 0 to inf using a semi-transparent colour like #ff808080
Use a special MAX (or MIN) RRA
Maybe you just want to display the maximum tariff for that period. So, you can create an additional MAX type RRA for each consolodation interval, and graph the MAX. Of course, this means that when you have 1 pixel = 1 day, you'll be seeing just the higher value as a straight line. I suppose a 'median' consolodation would be useful here, but for obvious reasons RRDTool doesn't have this.
A bigger graph
A large graph means more pixels, which means less averaging required.
Display your data differently
What are you trying to visualise? Possibly you don't need to see the average cost per unit over this time window, and a calculated sum of units x tariff would work better - particularly since this would be able to be averaged up without problem.
Don't use RRDTool
RRDTool is designed to progressively normalise, summarise and expire regular time-series data over time, and does so very efficiently. However, if you're interested in having the exact data forever then maybe you need a different database.
Pre-normalise your sampling times
If your data change frequently, then Data Normalisation might be changing them. Make sure you always store the data on a time step boundary - if your RRD step is 300 (5min) then ensure you store the data at timestamps which are a multiple of 300, and don't use N (meaning 'whatever the time is Now').
Thank you for this very elaborate answer. Because I have not been able to find suitable answers to my problem, your summary, and my situation, may hopefully also help others.
Looking back, I should have included more information, and before addressing your points, let me do that right now:
Here is a snippet of my database definition script:
# 5 min sample rate = 300 step rate
# 1D report: 1 step every 5 minutes (5 min sample rate=1), 20 per hr, 24hrs = 480 slots
# 1Wk report: 1 step every hour (20 x 5 min samples=20), 24 hrs, 7 days = 168 slots
# 1M report: 1 step every hour (20 x 5 min samples=20), 24 hrs, 30 days = 720 slots
# 1Y report: 1 step every day (24 Hr * 20 samples=480), 365 days = 365 slots
rrdtool create energy_mon.rrd --step 300 --start 1480943366 \
DS:meter_total:COUNTER:600:U:U \
DS:meter_low:COUNTER:600:U:U \
DS:meter_hi:COUNTER:600:U:U \
DS:energy:GAUGE:600:U:U \
DS:tarif:GAUGE:600:U:U \
RRA:AVERAGE:0.5:1:480 \
RRA:AVERAGE:0.5:20:168 \
RRA:AVERAGE:0.5:20:720 \
RRA:AVERAGE:0.5:480:365
Here is a snippet of my graphing script:
#daily
rrdtool graph $GDIR/energy_daily.png --start -1d \
-w 675 -h 250 \
--vertical-label "KWatt" \
--lower-limit=0 \
--watermark "`date`" \
DEF:energy=$DIR/energy_mon.rrd:energy:AVERAGE \
LINE1:energy$GREEN_COLOR:"Energie" \
DEF:tarif=$DIR/energy_mon.rrd:tarif:AVERAGE \
LINE1:tarif$BLACK_COLOR:"Tarief"
#two days
rrdtool graph $GDIR/energy_2days.png --start -2d \
-w 675 -h 250 \
--vertical-label "KWatt" \
--lower-limit=0 \
--watermark "`date`" \
DEF:energy=$DIR/energy_mon.rrd:energy:AVERAGE \
LINE1:energy$GREEN_COLOR:"Energie" \
DEF:tarif=$DIR/energy_mon.rrd:tarif:AVERAGE \
LINE1:tarif$BLACK_COLOR:"Tarief"
Here is a graph of the "1 day" result:
1 day
Here is a graph of the "last 48 hrs" :
2 day
In order to see my "tarif" on the graphs together with the actual energy usage in KWatts, I multiply the 1 (low) or 2 (normal) setting by 100 and store that number in the database. On weekdays, the tariff changes at 23:00 hrs to "low" and back to normal at 07:00 hrs. On the "daily" graph this is reported correctly, albeit with a 5 min. sample resolution "error".
As you can see, the "last 48 hrs" set by "-2d" in the graphing instructions, should be using the same RRA data from the database, but the tarif line does not switch from 100 to 200, but "sits" at about half way for approx. 1-1/2 hours in time, or more than 30 samples. This is not a rounding based on a pixel level resolution (;-).
The only explanation I have at this moment is that the "-2d" graph uses the second RRA, that is actually intended for a week's worth of data. If this is the answer, then how does one influence the relationship between the "-d/w/m/y" settings in the graphing instructions and the intended RRA's? I don't see a connection between the two, so it's probably selected automatically, and most likely (if it's that smart) based on the amount of data points. The first RRA does not have enough so it switches to use the second RRA. Seems plausible, but is this true?
BTW, using "-1w", "-1m" and "-1y" for the other graphs I use, show the same weird results on the tarif "signal.
In the meantime, I have been able to "fix" the graphs without redefining my database by adding a CDEF. Here is what I use for all graphs other than the "1d".
DEF:tarif=$DIR/energy_mon.rrd:tarif:AVERAGE \
CDEF:normaal=tarif,100,GT,200,100,IF \
LINE1:normaal$BLACK_COLOR:"Tarief"
To explain the CDEF, IF the tariff is > 100 (or actually 1=low), set the result to "normaal" which is the high tariff. IF not, set the tariff to "dal" or low.
Your advice to use the background color for tariff is a very good one, and because I can use that on the current database, I have tried that too.
So is your MIN/MAX solution, I thought about that earlier but put that off because that will require setting up a new database and filling that with previous data. (which I have in .csv form, and I have gone through that tedious exercise before - Which is why I have the --start in the rddtool create)
But, I would like to get to the bottom of this issue.
Do you or anybody else have any more words of wisdom to share or can verify/explain the working of the RRA <==> DEF relationship or selection?

rrdtool: Compute 95th percentile of data within a sliding window

I'm using rrdtool to graph data about CPU usage as produced and stored by Munin. Munin (at least for us) stores each data-series in an .rrd file with 12 RRAs: "MIN", "MAX", and "AVERAGE" over each of the four periods "last 2d in 5m intervals", "last 9d in 30m intervals", "last 270d in 12h intervals", and "last 177y in 144d intervals".
I already know how to use rrdtool graph to produce a trend line indicating where my average CPU usage is going. (For simplicity, we can pretend I'm on a single-CPU system; in real life I have more code to deal with that.)
rrdtool graph /tmp/foo.png \
--start -12w --end +24w \
--lower-limit 0 --upper-limit 100 --rigid \
--title 'cpu usage' --width 620 --height 200 --border 0 \
--vertical-label 'cpu usage' \
DEF:idle=/var/lib/munin/mybox/mybox-cpu-idle-d.rrd:42:AVERAGE \
DEF:iowait=/var/lib/munin/mybox/mybox-cpu-iowait-d.rrd:42:AVERAGE \
CDEF:percent_used=100,idle,-,iowait,- \
AREA:percent_used#00880077:'cpu usage' \
VDEF:fit_m=percent_used,LSLSLOPE \
VDEF:fit_b=percent_used,LSLINT \
CDEF:trendline=percent_used,POP,fit_m,COUNT,*,fit_b,+ \
LINE1:trendline#FFBB00:'Trend since 12w ago'
The problem with this graph is that it shows only the average CPU usage trend. But my workload is spiky: usage is very low 90% of the time and then has brief spikes. What I really care about is the trend of the spikes in CPU usage.
So I could run the same command replacing AVERAGE with MAX... but the actual maxes are so randomly distributed (and usually close to 100%) that they don't produce any useful trend line.
So I'm thinking that the graph I actually want would be a graph of the 95th percentile (or maybe just the 75th percentile... ideally I'd be able to adjust the parameter), where that "percentile" is taken over the data in each consecutive 24-hour period.
Conceptually, I want to boil down our last 9 days of data (48 data points per day) into just 9 data points (1 data point per day — representing the Nth percentile of the 48 original points from that day).
And then I'd fit a line to that data using LSLSLOPE and LSLINT and display it on the same graph as the rest of this stuff.
But I can't figure out how to boil down the data in this way, using rrdtool's RPN facilities.
I know that I can use PERCENTNAN to get the scalar number that is the 95th percentile of my whole data-series, but I want a data-series consisting of 9 numbers, not just one scalar.
I know that I can use TRENDNAN to get a data-series that is the mean of a sliding window of my data-series, which would be good enough if only it gave me the median (50th percentile) instead of the mean, and then allowed me to adjust that parameter from "50" up to "95"... but it doesn't.
Alternatively, I know how to use Python to compute the series I want, using rrdtool first and rrdtool fetch, but then there's no simple way to feed that series back into rrdtool to create the graph.
I'm thinking maybe I could extract usage_today, usage_yesterday, usage_2d, usage_3d,... into nine separate series, use PERCENTNAN on them all individually, and then somehow fit a line to that. But that's mostly desperate handwaving; if someone posted an answer that actually made that approach work, I'd accept it.
RRDTool has 95th percentile functionality built in. Note that the accuracy of the percentail calculations will depend on the granularity of the data available in the requested time period, though... so the bigger your 1-pdp RRA is, the better.
So, for example, to get a horizontal line at the 95th percentile, we can use these directives:
DEF:idlehr=/var/lib/munin/mybox/mybox-cpu-idle-d.rrd:42:AVERAGE:step=1
VDEF:pctidle=idlehr,95,PERCENTNAN
HRULE:pctidle#ff0000:95th_Percentile
The step=1 on the end of the DEF ensures that the highest resolution data available will be selected. This may be computationally intensive, if you're graphing for a full year and high resolution data are avaialable for this time window!
The problem is, though, that you want a graph showing a different value for each day -- in effect, a sliding window of percentile calculations, in the same way as TRED and PREDICT work, but with a step of one day. RRDTool cannot do this.
So, the answer is, you can show a graph for one day with a single value percentile for that day. You cannot create a graph with one data point per day, where that data point is calculated as the percentile for that day.
The only way I can think of to achieve this is to repeatedly call rrdtool xport iteratively to calculate the percentile values for a sequence of days, and then use that data to generate a bar graph in another graphing package.

RRD Tool Graph is not generating correct for one week and yearly

I have a case where I have collected SNMP data and stored it via rrdtool.
for daily and weekly graph is coming correct but when i see monthly and yearly it is showing only that day portion not correct graph as shown below.
Daily Graph code is : (working correct)
/usr/bin/rrdtool graph /opt/elitecore/ManageEngine/AppManager11/working/graphs/daily-tps.png -v "TPS" -t "TIME" DEF:tps1=/root/graphs/Total_TPS.rrd:TPS:MAX -s -86400 CDEF:tps2=tps1,300,* LINE1:tps2#ff0000:TOTAL_TPS GPRINT:tps2:LAST:"Cur: %5.2lf" GPRINT:tps2:AVERAGE:"Avg: %5.2lf" GPRINT:tps2:MAX:"Max: %5.2lf" GPRINT:tps2:MIN:"Min: %5.2lf\t\t\t"
Monthly Graph code is : (not coming graph as expected)
/usr/bin/rrdtool graph /opt/elitecore/ManageEngine/AppManager11/working/graphs/monthly-tps.png -v "TPS" -t "WEEK" DEF:tps1=/root/graphs/Total_TPS.rrd:TPS:MAX -s -2592000 CDEF:tps2=tps1,300,* LINE1:tps2#ff0000:TOTAL_TPS GPRINT:tps2:LAST:"Cur: %5.2lf" GPRINT:tps2:AVERAGE:"Avg: %5.2lf" GPRINT:tps2:MAX:"Max: %5.2lf" GPRINT:tps2:MIN:"Min: %5.2lf\t\t\t"
Yearly Graph code is : (not coming graph as expected)
/usr/bin/rrdtool graph /opt/elitecore/ManageEngine/AppManager11/working/graphs/yearly-tps.png -v "TPS" -t "MONTH" DEF:tps1=/root/graphs/Total_TPS.rrd:TPS:MAX -s -31536000 CDEF:tps2=tps1,300,* LINE1:tps2#ff0000:TOTAL_TPS GPRINT:tps2:LAST:"Cur: %5.2lf" GPRINT:tps2:AVERAGE:"Avg: %5.2lf" GPRINT:tps2:MAX:"Max: %5.2lf" GPRINT:tps2:MIN:"Min: %5.2lf\t\t\t"
Kindly let me know if i am doing any wrong.
yours Faithfully
Jignesh Dholakiya
Answer
The graph only shows five day's worth of data in the graphs because that is all the data there is in your RRD. Your RRD is configured to automatically discard any data older than this.
Explanation
The graphs show that your RRD currently only has 6 days' worth of data to display. As you cannot graph data which you do not have, the graphs show what they do have, and nothing for the rest.
Your rrdtool info gives this for RRA definitions (trimmed for clarity):
step = 300
rra[0].cf = "MAX"
rra[0].rows = 1500
rra[0].pdp_per_row = 1
rra[0].xff = 5.0000000000e-01
This means that you have a single RRA, type MAX, which has 1pdp per row and 1500 rows.
As a result, your RRA is (step)x(pdp per row)x(number of rows) long, which is 1500x300 seconds, which is a little over 5 days.
Since your RRD only has a single RRA, all of your graph functions will use this one -- doing additional consolidation on the fly if necessary. Thus all your graphs use this single RRA.
However, your RRA is only 5-and-a-bit days long. Therefore, data will be expired and discarded when it is this old. As a result, only the last 5-and-a-bit days' worth of data are available at any time for graphing, which is what you see in the graphs.
Solution:
You need to keep the data for longer. There are two ways to do this --
Increase the length of the existing RRA
Create additional RRAs to hold the consolidated data for the lower-resolution graphs.
Option 1 is the simplest, as you can use rrdtool tune to grow the size of RRA number 0. However, it is very expensive in disk space (since you will be keeping the detailed data for the entire time period), plus it is expensive in CPU (RRDtool will have to consolidate on the fly when making yearly graphs). This option is only recommended if you really need the high-resolution data for the entire period -- such as if you are calculating 95th Percentiles, for example.
Option 2 is the best. You add a new RRA, with the same CF but more pdp_per_row, for each graph you will be wishing to create. For a Weekly graph, use pdp_per_row=6 (for a half-hour consolidation), for Monthly use 24 (two-hourly) and for Yearly, use 288 (daily consolidation). As time goes by, the data will be consolidated up into these new RRA, and the graph functions will use them in preference. This is less computationally expensive, and uses less disk space; however you lose the high-resolution data over time, and your historical data will not be automatically consolidated into the new RRAs. Also, you cannot just add a new RRA to an existing RRD file -- you will need to either create a new RRD, or use a tool such as rrdmerge.

RRDTool database definition and plotting the data -I need a second opinion

Here is what I am trying to achieve:
I read my data once a day (the exact time of the day is not very important).
I want to archive the values for this DS for two years back.
I need to be able to look back for 2 years and I need the value for every day
and I also need to see the weekly average
If I miss a reading for two consecutive days the data should be declared unknown
Here is what I am using for this:
rrdtool create Carsforsale.rrd --start 20130217 --step 86400 ^
DS:MidsizeCars:GAUGE:172800:U:U ^
DS:FullSizeCars:GAUGE:172800:U:U ^
RRA:AVERAGE:0:7:104^
RRA:LAST:0:7:1:720
I updated the above database with
rrdtool update Carsforsale.rrd 1361203200:554:791
rrdtool update Carsforsale.rrd 1361289600:556:795
The updated correspond to yesterday and the day before yesterday (18, 19 Feb)
I tried to plot the graphs for the above using this
rrdtool graph "Inventory.png" \
--start "20130217" \
--imgformat PNG --width 850 --height 400 \
DEF:MidsizeCars=Carsforsale.rrd:MidsizeCars:AVERAGE \
DEF:FullSizeCars=Carsforsale.rrd:FullSizeCars:AVERAGE \
AREA:MidsizeCars#0000FF:"MidsizeCars" \
AREA:FullSizeCars#FF004D:"FullSizeCars:STACK"'
And now here are the my questions:
are the step and the heart beat defined correctly for what I wantto do ?
Why are my graphs empty ?
Looking into the database with the freeware utility called RRD Editor I could see that the last values are stored in the MidSizeCars and FullSizecars but the only DS that contains a history of what has been loaded into the database is the archiving function LAST Am I supposed to plot LAST or Average to see the current values ?
Thanks
C
since you want to keep the data for two years at 1 day resolution, you have to setup an appropriate RRA for this purpose ... since this will only be about 730 values, I would not bother with setting up an extra consolidated RRA for the week. this will get calculated on the fly ...