rrd4j archive type - rrdtool

I can't manage to create an archive with the correct type.
What am I missing?
My example is very similar to the official example on https://code.google.com/p/rrd4j/wiki/Tutorial
RRD creation:
rrdDef.setStartTime(L - 300);
rrdDef.addDatasource("speed", DsType.GAUGE, 600, Double.NaN, Double.NaN);
rrdDef.addArchive(ConsolFun.MAX, 0.5, 1, 24);
rrdDef.addArchive(ConsolFun.MAX, 0.5, 6, 10);
I add some values: (1,2,3 for each step)
long x = L;
while (x <= L + 4200) {
Sample sample = rrdDb.createSample();
sample.setAndUpdate((x + 11) + ":1");
sample.setAndUpdate((x + 12) + ":2");
sample.setAndUpdate((x + 14) + ":3");
x += 300;
}
And then I fetch it:
FetchRequest fetchRequest = rrdDb.createFetchRequest(ConsolFun.MAX, (L - 600), L + 4500);
FetchData fetchData = fetchRequest.fetchData();
String s = fetchData.dump();
I get the result: (hoping to find the maximum)
920804100: NaN
920804400: NaN
920804700: +1.0000000000E00
920805000: +1.0166666667E00
920805300: +1.0166666667E00
...
920808600: +1.0166666667E00
920808900: +1.0166666667E00
920809200: NaN
I would like to see the maximum value here. Tried it with total as well, and I get THE SAME result.
What do I have to change, so I get the greatest value sent in one step, or to get the sum of the values sent in one step.
Thanks

The MAX is not the maximum value input but the maximum consolidated data point. What you're saying to rrd given your example is
At one point in time I'm going 1MPH
One second later I'm going 2MPH
Two seconds later I'm going 4MPH
rrd now has 3 data points covering 3 seconds of a 300 second interval. What should rrd store? 1, 2, or 3? None of the above it has to normalize the data in some way to say between X and X+STEP the rate is Y.
To complicate matters it's not for certain that your 3 data points are landing in the the same 300 second interval. Your first 2 data points could be in one interval and the 4MPH could be in the next one. This is because the starting data point stored is not exactly start+step. i.e. if you start at 14090812456 it might be something like 14090812700 even though your step is 300
The only way to store exact input values with GAUGE is to push updates at the exact step times rrd store the data points. I'm going 1MPH at x, 2MPH at x+300, 4MPH at x+300 where x starts at the first data point.
Here is a bash example showing this working using your rrd settings, I'm using a constant start time and x starting at what I know is rrd's first data point.
L=1409080000
rrdtool create max.rrd --start=$L DS:speed:GAUGE:600:U:U RRA:MAX:0.5:1:24 RRA:MAX:0.5:6:10
x=$(($L+200))
while [ $x -lt $(($L+3000)) ]; do
rrdtool update max.rrd "$(($x)):1"
rrdtool update max.rrd "$(($x+300)):2"
rrdtool update max.rrd "$(($x+600)):3"
x=$(($x+900))
done
rrdtool fetch max.rrd MAX -r 600 -s 1409080000
speed
1409080200: 1.0000000000e+00
1409080500: 2.0000000000e+00
1409080800: 3.0000000000e+00
1409081100: 1.0000000000e+00
1409081400: 2.0000000000e+00
1409081700: 3.0000000000e+00
1409082000: 1.0000000000e+00
Not really that usefull but if you increase the resolution to say 1200 seconds you start getting max over larger time intervals
rrdtool fetch max.rrd MAX -r 1200 -s 1409080000
speed
1409081400: 3.0000000000e+00
1409083200: 3.0000000000e+00
1409085000: nan
1409086800: nan
1409088600: nan

Related

One hour graphs generated one hour before UK BST -> GMT show 'nan'

During the last UK BST/GMT changeover I observed an odd thing happen on a production box where the 1 hour graphs generated by rrdtool where showing -nan and querying the rrd directly also returned the same which then lead to a load of alerts being generated for things appearing to be down.
The rrd was valid and was updating when checked tith rrdtool info;
rrd_version = "0003"
step = 60
last_update = 1572136453
header_size = 2912
ds[ds0].index = 0
ds[ds0].type = "COUNTER"
ds[ds0].minimal_heartbeat = 600
ds[ds0].min = 0.0000000000e+00
ds[ds0].max = 1.0000000000e+05
ds[ds0].last_ds = "554"
ds[ds0].value = 5.5084745763e+00
ds[ds0].unknown_sec = 0
ds[ds1].index = 1
ds[ds1].type = "COUNTER"
ds[ds1].minimal_heartbeat = 600
ds[ds1].min = 0.0000000000e+00
ds[ds1].max = 1.0000000000e+05
ds[ds1].last_ds = "0"
ds[ds1].value = 0.0000000000e+00
ds[ds1].unknown_sec = 0
However calling:
rrdtool graph dummy --start=end-300s DEF:x=timedelta.rrd:ds0:AVERAGE VDEF:xa=x,AVERAGE PRINT:xa:%lf
Returned;
0x0
-nan
If I extended the window of --start=end-300s to something like 300000s I got a valid response. 5.596822
Graphs rendered via PHP for the hour showed as blank and all the time divisions along the bottom vanished. However, I have 6 Hour and longer graphs generated from the same rrd with the same php code that where still updating. (e.g. the 6 hour graph was showing valid data but lagging behind by about 10min presumable due to the way its generated.
Have I found a bug, or is this some odd behaviour to be expected during the DST changeover? (as a note I don't remember seeing this when going from GMT -> BST, and I don't remember what happened last year).
rrdtool-1.4.8-9.el7.x86_64
rrdtool-devel-1.4.8-9.el7.x86_64
php-7.3.11-1.el7.remi.x86_64
php-pecl-rrd-2.0.1-6.el7.remi.7.3.x86_64
And yes, the system locale is correctly configured.

pyplot - yticks data representation help - need to convert in KB/MB

I am trying to plot graph for throughput numbers.
my data is x axis = time in epoch, y = throughput in bytes.
I have y-ticks as
print loc, labels
[ 0. 5000000. 10000000. 15000000. 20000000. 25000000.
30000000. 35000000.]<a list of 8 Text yticklabel objects>
I want to show this data in KB or MB. Please help on how I can go about it?
I am lost and stuck. Currently the data on y starts with 0 -> 3.5 (1e7) which in itself does not make sense about throughput.
So y ticks are - 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5 with 1e7
Appreciate help!

RRD DB fake value generator

I want to generate fake values in RRD DB for a period of 1 month and with 5 seconds as a frequency for data collection. Is there any tool which would fill RRD DB with fake data for given time duration.
I Googled a lot but did not find any such tool.
Please help.
I would recommend the following one liner:
perl -e 'my $start = time - 30 * 24 * 3600; print join " ","update","my.rrd",(map { ($start+$_*5).":".rand} 0..(30*24*3600/5))' | rrdtool -
this assumes you have an rrd file called my.rrd and that is contains just one data source expecting GAUGE type data.

Calculating the distance between characters

Problem: I have a large number of scanned documents that are linked to the wrong records in a database. Each image has the correct ID on it somewhere that says where it belongs in the db.
I.E. A DB row could be:
| user_id | img_id | img_loc |
| 1 | 1 | /img.jpg|
img.jpg would have the user_id (1) on the image somewhere.
Method/Solution: Loop through the database. Pull the image text in to a variable with OCR and check if user_id is found anywhere in the variable. If not, flag the record/image in a log, if so do nothing and move on.
My example is simple, in the real world I have a guarantee that user_id wouldn't accidentally show up on the wrong form (it is of a specific format that has its own significance)
Right now it is working. However, it is incredibly strict. If you've worked with OCR you understand how fickle it can be. Sometimes a 7 = 1 or a 9 = 7, etc. The result is a large number of false positives. Especially among images with low quality scans.
I've addressed some of the image quality issues with some processing on my side - increase image size, adjust the black/white threshold and had satisfying results. I'd like to add the ability for the prog to recognize, for example, that "81*7*23103" is not very far from "81*9*23103"
The only way I know how to do that is to check for strings >= to the length of what I'm looking for. Calculate the distance between each character, calc an average and give it a limit on what is a good average.
Some examples:
Ex 1
81723103 - Looking for this
81923103 - Found this
--------
00200000 - distances between characters
0 + 0 + 2 + 0 + 0 + 0 + 0 + 0 = 2
2/8 = .25 (pretty good match. 0 = perfect)
Ex 2
81723103 - Looking
81158988 - Found
--------
00635885 - distances
0 + 0 + 6 + 3 + 5 + 8 + 8 + 5 = 35
35/8 = 4.375 (Not a very good match. 9 = worst)
This way I can tell it "Flag the bottom 30% only" and dump anything with an average distance > 6.
I figure I'm reinventing the wheel and wanted to share this for feedback. I see a huge increase in run time and a performance hit doing all these string operations over what I'm currently doing.

How to create a DS to store the accumulated result of another DS with rrdtool

I want to create a rrd file with two data souces incouded. One stores the original value the data, name it 'dc'. The other stores the accumulated result of 'dc', name it 'total'. The expected formula is current(total) = previous(total) + current(dc). For example, If I update the data sequence (2, 3, 5, 4, 9) to the rrd file, I want 'dc' is (2, 3, 5, 4, 9) and 'total' is (2, 5, 15, 19, 28).
I tried to create the rrd file with the command line below. The command fails and says that the PREV are not supported with DS COMPUTE.
rrdtool create test.rrd --start 920804700 --step 300 \
DS:dc:GAUGE:600:0:U \
DS:total:COMPUTE:PREV,dc,ADDNAN \
RRA:AVERAGE:0.5:1:1200 \
RRA:MIN:0.5:12:2400 \
RRA:MAX:0.5:12:2400 \
RRA:AVERAGE:0.5:12:2400
Is there an alternative manner to define the DS 'total' (DS:total:COMPUTE:PREV,dc,ADDNAN) ?
rrdtool does not store 'original' values ... it rather samples to signal you provide via the update command at the rate you defined when you setup the database ... in your case 1/300 Hz
that said, a total does not make much sense ...
what you can do with a single DS though, is build the average value over a time range and multiply the result with the number of seconds in the time range and thus arrive at the 'total'.
Sorry a bit late but may be helpful for someone else.
Better to use RRDtool's ' mrtg-traffic-sum ' package which when I'm using an rrd with GAUGE DS & LAST a the RRA's so it's allowing me to collect monthly traffic volumes & quota limits.
eg: Here is a basic Traffic chart with no traffic quota.
root#server:~# /usr/bin/mrtg-traffic-sum --range=current --units=MB /etc/mrtg/R4.cfg
Subject: Traffic total for '/etc/mrtg/R4.cfg' (1.9) 2022/02
Start: Tue Feb 1 01:00:00 2022
End: Tue Mar 1 00:59:59 2022
Interface In+Out in MB
------------------------------------------------------------------------------
eth0 0
eth1 14026
eth2 5441
eth3 0
eth4 15374
switch0.5 12024
switch0.19 151
switch0.49 1
switch0.51 0
switch0.92 2116
root#server:~#
From this you can then write up a script that will generate a new rrd which stores these values & presto you have a traffic volume / quota graph.
Example fixed traffic volume chart using GAUGE
This thread reminded me to fix this collector that had stopped & just got around to posting ;)