One-off and no-data-problem - rrdtool

I'm just getting started using RRDtool to collect climate data. I don't use the graph functionality, but rather use "fetch" to retrieve data. I then use another graphing solution (flot) to display the data, and that seems to work somewhat. But I had some small problems and decided to check the details of the update and fetching and was suddenly not so sure that things worked as I expected.
So I've created a tiny shell script that creates a database, put a single value in it and then print the contents:
#!/bin/sh
RRD=test.rrd
STEP=300
HB=600
# Remove previous databse to be sure that
# old data does not affect the test
rm -f $RRD
# Create database
rrdtool create $RRD \
--start 2999999999 --step $STEP \
DS:a:GAUGE:$HB:U:U \
RRA:AVERAGE:0.5:1:1000
# Do a single update
rrdtool update $RRD \
3000000400:123
# Fetch data and print to stdout
rrdtool fetch $RRD \
--start 3000000000 --end 3000000900 AVERAGE
I would expect this to print three (or perhaps four, not sure about the last one) values like this:
3000000000: -nan
3000000300: 123
3000000600: -nan
3000000900: -nan
But this is what I get:
3000000300: -nan
3000000600: -nan
3000000900: -nan
3000001200: -nan
So I've three questions:
Why does the fetch command start at 300, instead of 0?
Why does the fetch command include not only the last step (900) but also one more (1200)?
Why was not the updated value accepted?

The timeslot b contains information valid for b-step up to b EXCLUDING b itself. Hence when asking for data from 3000000000 to 3000000900 the first entry you get is 3000000300.
Since you are asking for data to end at 3000000900 you get the entry for 3000001200 as well as 3000000900 itself is the start of this entry.
At the moment even in gauge mode you would have to have a known value to start off ... so your first known update will simply bring you back into known state, it does not yet establish anything else. One might argue that in GAUGE mode this could be done differently.

Related

RRDTOOL - RPN-Expression help, how to reference input in COMPUTE DS (add two DS values or the LAST DS value with a new number)?

I have a data feed that has a single value that increases over time until a forced wrap-around.
I have the wrap-around under control.
The value from the data feed I pass into a RRD GAUGE as ds1.
I want to add a couple data sources to handle exceptions where on a certain condition detected by my script (that calls rrdupdate) to add some details for reporting.
When the condition is true in the script, I want to update the RRD with:
the normal value into ds1
the difference of the prior value to the current value to be marked as batch exceptions into ds2
count (sum) all ds2 values in a similar way to ds1.
I've been playing with the below but wonder if there is a method using COMPUTE or do I need to code all the logic into the bash script to poll rrdinfo, fetch the last_ds lines and prep the data accordingly? Does the rrd COMPUTE type have the ability to read other DS's?
If ds2.value > 0 then set ds3.value to (ds3.last_ds + ds2.value) ?
I looked at the rpn-expression and found it references 'input' but does not show how to feed those inputs into the COMPUTE operation?
eg:
Currently state
DS:ds1:GAUGE:28800:0:U
DS:ds2:COUNTER:1800:0:U
DS:ds3:GAUGE:1800:0:U
RRA:LAST:0.99999:1:72000
RRA:LAST:0.99999:4:17568
RRA:LAST:0.99999:8:18000
RRA:LAST:0.99999:32:4500
RRA:LAST:0.99999:96:1825
Desired state?
DS:ds1:GAUGE:28800:0:U
DS:ds2:COUNTER:1800:0:U
DS:ds3:COMPUTE:1800:0:U
DS:cs1:COMPUTE:input,0,GT,ds3,ds2,+,input,IF <-- what is 'input' is it passed via rrdupdate cs1:[value]?
RRA:LAST:0.99999:1:72000
RRA:LAST:0.99999:4:17568
RRA:LAST:0.99999:8:18000
RRA:LAST:0.99999:32:4500
RRA:LAST:0.99999:96:1825
Alternatively ds1 could have store the total without the exceptions and I could use an AREA and a STACK to plot the total.
If someone is knowledgeable of rpn-expressions when used with rrd it would be a massive help to clarity the rpn-express input reference & what is possible. There is very limited info online about this. If the script has to poll the RRD files for last_ds and do the calculations that is fine just it RRA has the smarts in the COMPUTE DS type, I'd rather use them.
Thank you.
A COMPUTE type datasource needs to have an RPN formula that completely describes it in terms of the other (non-compute) datasources. So, you cannot have multiple definitions of the same source, nor can it populate until the last of the other DS for that time window have been populated.
So, for example, if you have datasources a and b, and you want a COMPUTE type datasource that is equal to a if b<0, and a+b otherwise, you can use
DS:a:COUNTER:1800:0:U
DS:b:GAUGE:1800:0:U
DS:c:COMPUTE:b,0,GT,a,b,+,a,IF
From this, you can see how the definition of c uses RPN to define a single value using the values of a and b (and a constant). The calculation is performed solely within the configured time interval, and subsequently all three are stored and aggregated in the defined RRAs in the same way. You can also then use the graphs functions over c exactly as you would for a or b; the compute function is used only at data storing time.
Here is a full working example for the benefit of the original poster:
rrdtool create test.rrd --step 1800 \
DS:a:COUNTER:28800:0:U \
DS:b:COUNTER:28000:0:U \
DS:c:GAUGE:3600:0:U \
DS:d:COUNTER:3600:0:U \
DS:x:COMPUTE:b,0,GT,a,b,+,a,IF \
RRA:LAST:0.99999:1:72000 \
RRA:LAST:0.99999:4:17568 \
RRA:LAST:0.99999:8:18000 \
RRA:LAST:0.99999:32:4500 \
RRA:LAST:0.99999:96:1825

How to put a 100% TICK or VRULE for unknown data with RRDTool

I'm plotting a number of data on various graphs using RRDTool, occasionally I get unknown data points, this is totally expected especially if the computer updating the RRDs is offline.
That's cool, however, when this happens, I want there to be a nice big red line (for each and every unkonwn so it makes the graph's viewer very aware that the value at those points is not 0, but actually UNKNOWN.
What I have:
What I want (Photoshopped):
Is there an easy/elegant way to accomplish this?
Here's what worked:
I used the CDEF with an existing Data Source (DS) instead of having to create a new DS.
I added the following 2 lines in my RRDTool Graph section
'CDEF:up=a1,0,*,0,EQ,0,1,IF' \
'TICK:up#DB0865:1.0' \
The CDEF doe the calculation of:
a1 * 0
Then compares the the result of that to 0. If they're equal, set "up" to "0" else set "up" to "1".
The only time that they would not be equal, would be if "a1" was unknown.
Therefore when there is a gap in the graph (no data), it will have a 100% vertical bar (TICK) of a deep purple/pink colour (#DB0865)
Even though the documentation on the RRDTool site Indicates that a DS can be added to an existing RRD, it actually cannot be (according to Tobi Oetiker). So I went with the above method to avoid losing all the data in the rrds that I already have when creating a new rrd with a new DS.
Here's an example of how it looks:
The elegant way would be to check if load includes any reasonable value. If not add 1 to DS that you create for this purpose.
So for Robin Database add new DS that will have value 0 or 1
DS:somestatus1:GAUGE:600:U:U
and then start adding 0 or 1 to this DS if your primary DS is not available
at the end for drawing the graph:
DEF:somestatus1=$RRD_FILE:somestatus1:AVERAGE \
CDEF:my_status_cdef=somestatus1,1,0,IF \
TICK:my_status_cdef#e0ffe0:1.0:"Device was ON\n" \
each TICK will draw 100% height vertical bar over the graph as you need
Another option is to create conditional CDEF that will create TICK if primary DS is none.
This method plots an area when "offline". The CDEF checks if the load measurement is UN (unknown), if it is, it will return 1, multiply by INF to make it to the highest value of the plot.
CDEF:offline=load,UN,INF,* \
AREA:offline#FF000011: \

rrdtool update multiple datasources in two commands

RRD does not update the second datasource correctly, see:
First, I create the RRD file with two datasources (c1 and c2):
rdtool create test.rrd --start N --step 60 DS:c1:GAUGE:120:0:100 DS:c2:GAUGE:120:0:100 RRA:AVERAGE:0.5:1:1440
Then I do update the two datasources in two commands:
rrdtool update test.rrd -t c1 N:10 && rrdtool update test.rrd -t c2 N:10
Wait for 60 seconds....
Do again an update :
rdtool update test.rrd -t c1 N:20 && rrdtool update test.rrd -t c2 N:20
Then lets see what we have:
rrdtool fetch test.rrd AVERAGE | tail -5
1468409580: -nan -nan
1468409640: -nan -nan
1468409700: -nan -nan
1468409760: 1,5988575517e+01 1,9266620475e-01
1468409820: -nan -nan
The first datacource c1 works as expected, but the second c2 shows a value lower than 1 and I expect also a value close to 15.
Yes, I know I can also update both datasources in ONE update command, but in my case a have a lot of datasources in one rrd file and its better to read and follow the mass of values.
Used rrd version : 1.6.0
This is, of course, Data Normalisation. It is also caused by your updating the two datasources in two separate calls.
If you instead use:
rrdtool update test.rrd -t c1:c2 N:10:10
rrdtool update test.rrd -t c1:c2 N:20:20
then you will be updating both DSs at the same time. You see, when you do it in separate updates, what you're actually doing is implicitly updating the other DS with 'unknown' and then relying on the automatic interpolation to fill things in. RRDTool is not a relational database, and you cannot update values in a timewindow independently without affecting the other values.
The other issue is Data Normalisation, where values are adjusted temporally to fit into the exact time boundaries and in doing so, the values are adjusted to be linearly equivalent... the practical upshot when using network traffic (big numbers) is pretty much the same, and the overall totals and averages are consistent, but smaller point-in-time values end up as decimals like this.
So, two things:
Update your DS all together, not in separate calls
Try to update exactly on the time boundary.(Instead of using 'N' use an exact time, rounded to the nearest minute)

rrdtool displays other values as entered

Can anybody explain to me why I get different values when I fetch from my rrd-db than what I filled it with.
Here are the commands:
a. Create database
rrdtool create temperature.rrd --step 300 -b 1374150100 \
DS:temp:GAUGE:300:N:N \
RRA:AVERAGE:0:1:5
b. Fill with data
rrdtool update temperature.rrd \
1374150400:6 \
1374150700:8 \
1374151000:4 \
1374151300:4
c. Fetch data
rrdtool fetch temperature.rrd AVERAGE --start 1374150099 --end 1374151301
Output:
temp
1374150300: 6.0000000000e+00
1374150600: 7.3333333333e+00
1374150900: 5.3333333333e+00
1374151200: 4.0000000000e+00
1374151500: -nan
I fill the database in the exact period with data. I have no idea why it displays 7.3 and 5.3?!
Did I missed something?
OK, I got it. The problem is, that the start time does not fit to the step interval. As you can see in the output, the steps begin with ...300, then ...600 and so on. I filled the data with ...400, ...700 etc.
So the solution is to set the values in the correct step interval and then it works

RRDTool database definition and plotting the data -I need a second opinion

Here is what I am trying to achieve:
I read my data once a day (the exact time of the day is not very important).
I want to archive the values for this DS for two years back.
I need to be able to look back for 2 years and I need the value for every day
and I also need to see the weekly average
If I miss a reading for two consecutive days the data should be declared unknown
Here is what I am using for this:
rrdtool create Carsforsale.rrd --start 20130217 --step 86400 ^
DS:MidsizeCars:GAUGE:172800:U:U ^
DS:FullSizeCars:GAUGE:172800:U:U ^
RRA:AVERAGE:0:7:104^
RRA:LAST:0:7:1:720
I updated the above database with
rrdtool update Carsforsale.rrd 1361203200:554:791
rrdtool update Carsforsale.rrd 1361289600:556:795
The updated correspond to yesterday and the day before yesterday (18, 19 Feb)
I tried to plot the graphs for the above using this
rrdtool graph "Inventory.png" \
--start "20130217" \
--imgformat PNG --width 850 --height 400 \
DEF:MidsizeCars=Carsforsale.rrd:MidsizeCars:AVERAGE \
DEF:FullSizeCars=Carsforsale.rrd:FullSizeCars:AVERAGE \
AREA:MidsizeCars#0000FF:"MidsizeCars" \
AREA:FullSizeCars#FF004D:"FullSizeCars:STACK"'
And now here are the my questions:
are the step and the heart beat defined correctly for what I wantto do ?
Why are my graphs empty ?
Looking into the database with the freeware utility called RRD Editor I could see that the last values are stored in the MidSizeCars and FullSizecars but the only DS that contains a history of what has been loaded into the database is the archiving function LAST Am I supposed to plot LAST or Average to see the current values ?
Thanks
C
since you want to keep the data for two years at 1 day resolution, you have to setup an appropriate RRA for this purpose ... since this will only be about 730 values, I would not bother with setting up an extra consolidated RRA for the week. this will get calculated on the fly ...