I am trying to create COMMAND JSON datasource to monitor some values, for example from such script:
print json.dumps({
'values': {
'': {'random': random()},
},
'events': []
})
And when i just starting zencommand, appropriate rrd file is created, but cur, avg and max values on graph shows me NaN. That NaNs is replaced by actual numbers when I zoom in to a current point in time, which is not very far from start of monitoring.
Why it don't show correct min, max and avg values before I zoom in? Is that somehow related to consolidation? I read http://www.vandenbogaerdt.nl/rrdtool/min-avg-max.php, but that page don't tell anything about NaN values.
And is any way to quicker zoom in to the current timestamp to see some data faster?
When you are zoomed out, you'll be looking at the lower-granularity RRAs (Round Robin Archives). These do not get populated until enough data are in the higher-granularity ones; so, for example, if you have a 5min-granularity RRA, a 1hr-granularity RRA, and a 1day-granularity RRA, and have collected data for the last 45min, then you will see ~8 data points in your 'daily' graph (which uses the 5min RRA), but nothing in your 'monthly' (which will use the 1hr RRA) or your 'yearly' (which uses the 1day RRA).
This applies to any RRA; AVG, LAST, MAX, etc. Until the consolidated time window is complete, and the full complement of Primary Data Points has been collected for consolidation, the consolidated data point value is undefined.
RRDTool picks the RRA to use based on the requested graph data width and pixel width, as well as the requested consolidation functions. Although there are ways to force RRDtool to use a higher-granularity RRA than it needs to, and to consolidate on the fly, this is inefficient and slow. It also makes having the lower-granularity RRA pointless and throws away one of the major benefits of RRDtool (that it performs consolidation at update time making graphing faster)
Related
We are using amCharts 4 to show trend logs, and sometimes we end up with a lot of data that has to go into the chart. We'd like to know what the maximum number of data points that a chart can handle so we know how much data to aggregate (to reduce the data point count) before sending it into the package. To show the most accurate representation of the data as possible, we don't want to aggregate more aggressively than we have to. Our charts are x/y charts with value vs. date/time for up to 8 series.
In one case, we have a data set with well in excess of 600,000 data points in 8 series, and loading this into the chart, even in batches (i.e., loading one batch in, then adding the remaining batches to it in turn), will cause the charting package to run out of memory. In the case cited here, during our test, the charting package ran out of memory on the third batch, where the total of the 3 batches exceeded 600,000 data points, preventing further batches from being loaded in. For large sites that use our product, it is quite common to have that much data that the user wants to see in a chart if they want to see 6 months or a year's worth of data; so it's important that we be able to show some kind of representation of all that data, which is where aggregation comes in.
I am using the Google business api, suddenly the entire api request stopped, the graph shows this:
REDUCE_PERCENTILE_99
Any idea what does this means please?
Thank you.
After research, I found that a Reducer operation describes how to aggregate data points from multiple time series into a single time series, where the value of each data point in the resulting series is a function of all the already aligned values in the input time series.
REDUCE_PERCENTILE_99: Reduce by computing the 99th percentile of data points across time series for each alignment period. This reducer is valid for GAUGE and DELTA metrics of numeric and distribution type. The value of the output is DOUBLE.
currently I'm experimenting a bit with RRDTool. I'm aware that the accuracy gets lower the longer the time periods are selected. But I thought I could bypass this with my datasource settings.
For example temperature and humidity from my house, resoultion 1h:
And now with the resolution of 1d:
As you could see, there is a great difference for the max. value of the blue line.
I created my datasources and archives with this values:
"rrdtool create temp.rrd --step 30",
"DS:temp:GAUGE:60:U:U",
"DS:humidity:GAUGE:60:U:U",
"RRA:AVERAGE:0.5:1:1051200",
"RRA:MAX:0.5:1:1051200",
"RRA:MIN:0.5:1:1051200",
I thought that 1051200 (1 year = 31536000 / 30 s (resoulution) = 1051200) is correct for saving every value for a year and that there should be no need for interpolating.
Is it possible to get the exact values displayed even if the resolution changes (for example the max humidity (Luftfeuchtigkeit) at 99.9%)?
Here are my values for image creation:
"--start" => "-1h", (-1d etc-)
"--title" => "Haustemperatur",
"--vertical-label" => "°C / % RLF",
"--width" => 800,
"--height" => 600,
"--lower-limit" => "-5",
"DEF:temperatur=$rrdFile:temperatur:LAST",
"DEF:humidity=$rrdFile:humidity:LAST",
"LINE1:temperatur#33CC33:Temperatur",
"GPRINT:temperatur:LAST:\t\tAktuell\: %4.2lf °C",
"GPRINT:temperatur:AVERAGE:Schnitt\: %4.2lf °C",
"GPRINT:temperatur:MAX:Maximum\: %4.2lf °C\j",
"LINE1:humidity#0000FF:Relative Luftfeuchtigkeit",
"GPRINT:humidity:LAST:Aktuell\: %4.2lf %%",
"GPRINT:humidity:AVERAGE:Schnitt\: %4.2lf %%",
"GPRINT:humidity:MAX:Maximum\: %4.2lf %%\j",
Thanks for your help and any suggestions.
P.S. I'm using a library to generate the graphs and the database, please do not be surprised about possible syntax errors.
Your problem is that you are causing the values to be rolled-up on the fly at graph time, but have not correctly specified which rollup function to use. Your second graph is showing the MAXIMUM of the LAST in the interval, not the true Maximum.
There are a few issues to explain with this configuration:
Firstly, your RRD is defined using 3 RRAs with 1cdp=1pdp and different consolidation functions (AVG, MIN, MAX). This means they are functionally identical, but they do not save you any time at graphing as they have not done any pre-rollup for you! You should definitely consider having just one of these (probably AVG) and adding others at lower resolution to help speed up graphing when you have a bigger time window.
Secondly, you need to specify the on-the-fly rollup function. When graphing, RRDTool will work out the best RRA to use based on your DEF lines, and will perform any additional consolidation required on the fly. This can take a long time if the only available RRA is too high-granularity.
Your graph request uses DEF:temperatur=$rrdFile:temperatur:LAST but you do not actually have a LAST type RRA, so RRDTool will grab the last average. Your RRA data points are at 30s interval, but your second graph has (approx) 5min per pixel, meaning that RRDTool needs to grab the 10 entries from the RRA, and print the last. Looking at the data in the top graph, it seems that the last in that interval was the 66 value, though previous ones were 100.
So you have a choice. Do you want the graph to show the average for the time period, the maximum, or both? Do you want the figures at the bottom to show the maximum of the average, or the maximum of everything?
For example
"DEF:temperatur=$rrdFile:temperatur:AVERAGE",
"DEF:humidity=$rrdFile:humidity:AVERAGE",
"DEF:temperaturmax=$rrdFile:temperatur:MAX;reduce=MAX",
"DEF:humiditymax=$rrdFile:humidity:MAX;reduce=MAX",
"LINE1:temperatur#33CC33:Temperatur",
"LINE1:temperaturmax#66EE66:Maximum Temperatur",
"GPRINT:temperatur:LAST:\t\tAktuell\: %4.2lf °C",
"GPRINT:temperatur:AVERAGE:Schnitt\: %4.2lf °C",
"GPRINT:temperaturmax:MAX:Maximum\: %4.2lf °C\j",
"LINE1:humidity#0000FF:Relative Luftfeuchtigkeit",
"LINE1:humiditymax#3333FF:Maximum Luftfeuchtigkeit",
"GPRINT:humidity:LAST:Aktuell\: %4.2lf %%",
"GPRINT:humidity:AVERAGE:Schnitt\: %4.2lf %%",
"GPRINT:humiditymax:MAX:Maximum\: %4.2lf %%\j",
In this case, we define a separate DEF for the maximum data set, so that we can always obtain the highest value even after consolidation. This is also used in the GPRINT so that we get the MAX of the MAX rather than the MAX of the AVERAGE. The Maximum line is now drawn separately to the average line, so that we can see the effect of any rollup of data - the lines will be together at high-resolution but get further apart as the time window widens and resolution decreases.
TheDEF is set to force any rollup function used for the maxima to be MAX rather than AVG, so we can be sure to get the maximum rather than average of maxima.
We are also using AVERAGE rather than LAST in order to get more meaningful data after rollup. Note that we could also use a separate DEF for the LAST as well if we wanted to though it is of less usefulness.
Note that, if you ever expect to be generating graphs over more than a few days, you should definitely consider adding some lower-resolution RRAs for AVERAGE and MAX or else the graphs will generate very slowly. RRDTool is designed with the intention that data will be rolled up over time, rather than (as in a traditional database) every sample kept as-is. So, unless you really need to have 30s resolution data kept for an entire year, you may prefer to keep this high resolution data for only a week, and then have separate RRAs that roll up to 1 hour resolution and keep for longer. Many people keep the 30s for 2 days, then 30min-summary for 2 weeks, 2h-summary for 2 months, and then 1day-summary for 2 years.
For more information, see the RRDTool manual pages.
How to get Second last modified data from rrd file using rrdtool?
By command rrdtool lastupdate we can get only last modified data. I want to get second last modified data.
Can any one tell me?
If you mean to get the actual data submitted, then you cannot do this. Remember that RRDTool stores normalised and consolidated data, not the raw data.
rrdtool lastupdate with give you the point in time and raw data value(s) of the last actual update, before normalisation and consolidation. This is stored so that ongoing rates can be calculated. After the next update, this data is normalised and consolidated so is no longer available.
You can use rrdtool fetch to obtain the last entries in any RRA (after normalisation and colsolidation). You can specify which RRA to use by giving the requested data resolution and consolidtion factor. Depending on the nature of your data (Gauge vs. Counter) and the time of submission (on the interval boundary or not) then this may or may not be the same.
So, in summary, if you have a 5-minute-interval RRD, with a 1cdp=1pdp AVG RRA, and you submit data at 11:59, 12:04 and 12:08, then lastupdate will give you "12:08" plus the data value(s) submitted; fetch will give you "12:00" (the start of the only completed 5-min time bucket) plus the normalised data for the 12:00-12:05 bucket.
I create a standard RRDTool database with a default step of 5mn (300s).
I have different types of values in it, some gauges which are easily processed, but I have other values I would have in COUNTER but here is my problem :
I read the data in a program, and get the difference between values over two steps is good but the counter increment less than time (It can increment by less than 300 during a step), so my out value is wrong.
Is it possible to change the COUNTER for not be a number by second but by step or something like that, if it's not I suppose I have to calculate the difference in my program ?
Thank you for helping.
RRDTool is capable of handling fractional values, so there is no problem if the counter increments by less than the seconds interval since the last update.
RRDTool stores everything as a Rate. If your DS is of type GAUGE, then RRDTool assumes that the incoming value is alreayd a rate, and only applies Data Normalisation (more on this later). If the type is COUNTER or DERIVE, then the value/timepoint you are updating with is compared to the previous value/timepoint to obtain a rate thus: r=(x2 - x1)/(t2 - t1). The rate obtained is then Normalised. The other DS type is ABSOLUTE, which assumes the counter was reset on the last read, giving r=x2/(t2 - t1).
The Normalisation step adjusts the data point based on assuming a linear progression from the last data point so that it lies exactly on an interval boundary. For example, if your step is 5min, and you update at 12:06, the data point is adjusted back to what it would have been at 12:05, and stored against 12:05. However the last unadjusted DP is still preserved for use at the next update, so that overall rates are correct.
So, if you have a 300s (5min) interval, and the value increased by 150, the rate stored will be 0.5.
If the value you are graphing is something small, e.g. 'number of pages printed', this might seem counterintuitive, but it works well for large rates such as network traffic counters (which is what RRDTool was designed for).
If you really do not want to display fractional values in the generated graphs or output, then you can use a format string such as %.0f to enforce no decimal places and the displayed number will be rounded to the nearest integer.