How is rrdtool PDP value calculated? - rrdtool

When I create a RRD database and update it with a GAUGE value 100, then value of the PDP value is set to 26.877900000. When I create a RRD database roughly a second later, then PDP value is 17.477500000:
usr#PC:~$ rm foo.rrd; rrdtool create foo.rrd --start 'N' --step '300' 'DS:RTT:GAUGE:600:0:1000000' 'RRA:AVERAGE:0.5:1:1440'; rrdtool update foo.rrd N:100; rrdtool dump foo.rrd | grep --color -E '<value>[0-9]+|<unknown_sec>|<lastupdate>'
<lastupdate>1551973741</lastupdate> <!-- 2019-03-07 17:49:01 EET -->
<value>2.6877900000e+01</value>
<unknown_sec> 241 </unknown_sec>
usr#PC:~$ rm foo.rrd; rrdtool create foo.rrd --start 'N' --step '300' 'DS:RTT:GAUGE:600:0:1000000' 'RRA:AVERAGE:0.5:1:1440'; rrdtool update foo.rrd N:100; rrdtool dump foo.rrd | grep --color -E '<value>[0-9]+|<unknown_sec>|<lastupdate>'
<lastupdate>1551973742</lastupdate> <!-- 2019-03-07 17:49:02 EET -->
<value>1.7477500000e+01</value>
<unknown_sec> 242 </unknown_sec>
usr#PC:~$
How is this PDP value calculated? My guess is that first time the rrdtool update foo.rrd N:100 happened 268.779ms later than rrdtool create. And second time the rrdtool update foo.rrd N:100 happened 174.775ms later than rrdtool create. Am I correct?

value contains rate*seconds which occurred up to the last run.
Dump implementation here: rrd_dump.c
CB_FMTS("\t\t<value>%0.10e</value>\n",
rrd.pdp_prep[i].scratch[PDP_val].u_val);
Description of pdp_prep[].scratch[PDP_val].u_val you can find here: rrd_update.c#L1689
/* in pdp_prep[].scratch[PDP_val].u_val we have collected
rate*seconds which occurred up to the last run.

Related

How can I visualize timeseries data aggregated by more than one dimension on AWS insights?

I'd like to use cloudwatch insights to visualize a multiline graph of average latency by host over time. One line for each host.
This stats query extracts the latency and aggregates it in 10 minute buckets by host, but it doesn't generate any visualization.
stats avg(latencyMS) by bin(10m), host
bin(10m) | host | avg(latencyMS)
0m | 1 | 120
0m | 2 | 220
10m | 1 | 130
10m | 2 | 230
The docs call this out as a common mistake but don't offer any alternative.
The following query does not generate a visualization, because it contains more than one grouping field.
stats avg(myfield1) by bin(5m), myfield4
aws docs
Experementally, cloudwatch will generate a multi line graph if each record has multiple keys. A query that would generate a line graph must return results like this:
bin(10m) | host-1 avg(latencyMS) | host-2 avg(latencyMS)
0m | 120 | 220
10m | 130 | 230
I don't know how to write a query that would output that.
Parse individual message for each host then compute their stats.
For example, to get average latency for responses from processes with PID=11 and PID=13.
parse #message /\[PID:11\].* duration=(?<pid_11_latency>\S+)/
| parse #message /\[PID:13\].* duration=(?<pid_13_latency>\S+)/
| display #timestamp, pid_11_latency, pid_13_latency
| stats avg(pid_11_latency), avg(pid_13_latency) by bin(10m)
| sort #timestamp desc
| limit 20
The regular expressions extracts duration for processes having id 11 and 13 to parameters pid_11_latency and pid_13_latency respectively and fills null where there is no match series-wise.
You can build from this example by creating the match regular expression that extracts for metrics from message for hosts you care about.

RRD Tool - confusing start time

I'm setting up a rrd database to store sensor data for 3 days in 12hr intervalls (43200s) = 6 row in RRA.
rrdtool create test.rrd --step 43200 --start 1562429286 DS:temp:GAUGE:86400:U:U RRA:AVERAGE:0:1:6
The databases starting time is 1562429286 (06.07.2019 - 18:08:06).
When I dump the database:
rrdtool dump test.rrd
it says (output trimmed for clarity):
2019-07-04 02:00:00 CEST / 1562198400 NaN
2019-07-04 14:00:00 CEST / 1562241600 NaN
2019-07-05 02:00:00 CEST / 1562284800 NaN
2019-07-05 14:00:00 CEST / 1562328000 NaN
2019-07-06 02:00:00 CEST / 1562371200 NaN
2019-07-06 14:00:00 CEST / 1562414400 NaN
I expected rrdtool to give the next nearest timestamp ( 6.7.19 18:00 ) as the last entry ("starting point") instead. So why is it at 14:00 ?
At first this explanation (How to create a rrd file with a specific time?) made perfect sense for the small intervall of 5m to me. But in my case I cannot get behind the logic if the intervall is bigger (12h)
This is because the RRA buckets are always normalised to be aligned to the GMT (UCT) timezone. It is not visible if you are using a cdp (consolodated data point) width of an hour or less; but in your case, your cdp are 12 hours in width. Your timezone means that these are offset by 2 hours from UCT zero resulting in apparent boundaries of 02 and 14 local time (if you were in London then you'd be seeing 0 and 12 as expected).
This effect is much more noticeable when you are using 1-day rollups and are located in somewhere like New Zealand, when you'll see the CDP boundary appearing at noon rather than at midnight.
It is not currently possible to specify a different timezone to use as a base for the RRA buckets (this would make the data nonportable) though I believe it has been on the RRDTool feature request list for a number of years.

How to find the hdfs files time stamp to milli seconds level

Is there a way we can get the time stamp of the files in HDFS to millisecond level.
For example:
in linux we can get the full time stamp like below
$ ls --full-time
total 4
-rw-r--r--. 1 bigdatauser hadoop 0 2017-09-15 01:09:25.068425282 -0400 newfile1.txt
-rwxrwxrwx. 1 bigdatauser hadoop 106 2017-09-15 01:08:16.791844270 -0400 test.sh
If you use hdfs dfs -stat '%Y' you can see the time in milliseconds.
$ hdfs dfs -touchz /tmp/test_file
$ hdfs dfs -stat "%Y" /tmp/test_file
1506621031648
From http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html#stat:
Print statistics about the file/directory at in the specified format. Format accepts filesize in blocks (%b), type (%F), group name of owner (%g), name (%n), block size (%o), replication (%r), user name of owner(%u), and modification date (%y, %Y). %y shows UTC date as “yyyy-MM-dd HH:mm:ss” and %Y shows milliseconds since January 1, 1970 UTC. If the format is not specified, %y is used by default.

How to understand Primary Data Point(PDP) in rrdtool database?

If I dump an RRD to XML, then under "PDP Status" section there are three elements: <last_ds>, <value> and <unknown_sec>. For example:
<!-- PDP Status -->
<last_ds>90</last_ds>
<value>4.2177496500e+03</value>
<unknown_sec> 184 </unknown_sec>
Now as I understand, then each time I execute "rrd update", I will update Primary Data Point (PDP). Looks like whatever I put as a value for rrdtool update(for example rrdtool update test.rrd "N:abc"), then it is shown as a value for <last_ds> element. However, how is the number for <value> calculated? I mean the number 4217.7496500 in example above. Is this some kind of average? Last but not least, while I understand that <unknown_sec> shows the number of seconds when value of the DS has been unknown, then this counter seems to wrap around 280 - 295 seconds. How to explain this? I mean for example if I execute while true; do rrdtool update test.rrd "N:75"; rrdtool dump test.rrd | grep "<unknown_sec>"; sleep 1; done where 75 is lower than the lowest value allowed for this DS, then output is following:
/* data not shown for brevity */
<unknown_sec> 280 </unknown_sec>
<unknown_sec> 281 </unknown_sec>
<unknown_sec> 282 </unknown_sec>
<unknown_sec> 0 </unknown_sec>
<unknown_sec> 1 </unknown_sec>
<unknown_sec> 2 </unknown_sec>
/* data not shown for brevity */
the PDP content of <value> is the sum of all products of input value multiplied by duration this value was valid for. In order to build the PDP, at the end of the interval, this value is divided by the duration of the interval minus the number of unknown seconds ... The number of unknown seconds resets to 0 when a new interval is started ...

rrd graph configurate query

I am updating my RRD file with some counts...
For example:
time: value:
12:00 120
12:05 135
12:10 154
12:20 144
12:25 0
12:30 23
13:35 36
here my RRD is updating as below logic:
((current value)-(previous value))/((current time)-(previous time))
eg. ((135-120))/5 = 15
but my problem is when it comes 0 the reading will be negative:
((0-144))/5
Here " 0 " value comes with system failure only( from where the data is fetched)..It must not display this reading graph.
How can I configure like when 0 comes it will not update the "RRD graph" (skip this reading (0-144/5)) and next time it will take reading like ((23-0)/5) but not (23-144/10)
When specifying the data sources when creating the RRD, you can specify which range of values is acceptable.
DS:data_source:GAUGE:10:1:U will only accept values above 1.
So if you get a 0 during an update, rrd will replace it with unknown and i assume it can find a way to discard it.