How to understand Primary Data Point(PDP) in rrdtool database? - rrdtool

If I dump an RRD to XML, then under "PDP Status" section there are three elements: <last_ds>, <value> and <unknown_sec>. For example:
<!-- PDP Status -->
<last_ds>90</last_ds>
<value>4.2177496500e+03</value>
<unknown_sec> 184 </unknown_sec>
Now as I understand, then each time I execute "rrd update", I will update Primary Data Point (PDP). Looks like whatever I put as a value for rrdtool update(for example rrdtool update test.rrd "N:abc"), then it is shown as a value for <last_ds> element. However, how is the number for <value> calculated? I mean the number 4217.7496500 in example above. Is this some kind of average? Last but not least, while I understand that <unknown_sec> shows the number of seconds when value of the DS has been unknown, then this counter seems to wrap around 280 - 295 seconds. How to explain this? I mean for example if I execute while true; do rrdtool update test.rrd "N:75"; rrdtool dump test.rrd | grep "<unknown_sec>"; sleep 1; done where 75 is lower than the lowest value allowed for this DS, then output is following:
/* data not shown for brevity */
<unknown_sec> 280 </unknown_sec>
<unknown_sec> 281 </unknown_sec>
<unknown_sec> 282 </unknown_sec>
<unknown_sec> 0 </unknown_sec>
<unknown_sec> 1 </unknown_sec>
<unknown_sec> 2 </unknown_sec>
/* data not shown for brevity */

the PDP content of <value> is the sum of all products of input value multiplied by duration this value was valid for. In order to build the PDP, at the end of the interval, this value is divided by the duration of the interval minus the number of unknown seconds ... The number of unknown seconds resets to 0 when a new interval is started ...

Related

How to extract time value from cloudwatch logs to perform math operations

I have similar logs in AWS CloudWatch
2020-05-04 14:45:37.453 [http-nio-8095-exec-9] [34mINFO [0;39m xxx - Execution time of Class.methodNameOne :: 23 ms
2020-05-04 14:45:37.475 [http-nio-8095-exec-7] [34mINFO [0;39m xxx - Execution time of Class.methodNameTwo :: 32 ms
2020-05-04 14:45:37.472 [http-nio-8095-exec-3] [34mINFO [0;39m xxx - Execution time of Class.methodNameOne :: 38 ms
Created metric using the below pattern to obtain time value for the method methodNameOne only.
[..., method = class.methodNameOne, , time!=null,]
And I see methodNameOne in $method column, 23 and 38 values in $time column as rows.
lineno $method $time ......................
1 methodNameOne 23 ......................
2 methodNameOne 38 ......................
Later created alarm on the same metric, using maximum math function.
Alarm is not performing the max operation on the metric results
I want to calculate the maximum of time from the logs and alarm needs to be triggered when it crosses specified threshold value.

RRD Tool - confusing start time

I'm setting up a rrd database to store sensor data for 3 days in 12hr intervalls (43200s) = 6 row in RRA.
rrdtool create test.rrd --step 43200 --start 1562429286 DS:temp:GAUGE:86400:U:U RRA:AVERAGE:0:1:6
The databases starting time is 1562429286 (06.07.2019 - 18:08:06).
When I dump the database:
rrdtool dump test.rrd
it says (output trimmed for clarity):
2019-07-04 02:00:00 CEST / 1562198400 NaN
2019-07-04 14:00:00 CEST / 1562241600 NaN
2019-07-05 02:00:00 CEST / 1562284800 NaN
2019-07-05 14:00:00 CEST / 1562328000 NaN
2019-07-06 02:00:00 CEST / 1562371200 NaN
2019-07-06 14:00:00 CEST / 1562414400 NaN
I expected rrdtool to give the next nearest timestamp ( 6.7.19 18:00 ) as the last entry ("starting point") instead. So why is it at 14:00 ?
At first this explanation (How to create a rrd file with a specific time?) made perfect sense for the small intervall of 5m to me. But in my case I cannot get behind the logic if the intervall is bigger (12h)
This is because the RRA buckets are always normalised to be aligned to the GMT (UCT) timezone. It is not visible if you are using a cdp (consolodated data point) width of an hour or less; but in your case, your cdp are 12 hours in width. Your timezone means that these are offset by 2 hours from UCT zero resulting in apparent boundaries of 02 and 14 local time (if you were in London then you'd be seeing 0 and 12 as expected).
This effect is much more noticeable when you are using 1-day rollups and are located in somewhere like New Zealand, when you'll see the CDP boundary appearing at noon rather than at midnight.
It is not currently possible to specify a different timezone to use as a base for the RRA buckets (this would make the data nonportable) though I believe it has been on the RRDTool feature request list for a number of years.

How is rrdtool PDP value calculated?

When I create a RRD database and update it with a GAUGE value 100, then value of the PDP value is set to 26.877900000. When I create a RRD database roughly a second later, then PDP value is 17.477500000:
usr#PC:~$ rm foo.rrd; rrdtool create foo.rrd --start 'N' --step '300' 'DS:RTT:GAUGE:600:0:1000000' 'RRA:AVERAGE:0.5:1:1440'; rrdtool update foo.rrd N:100; rrdtool dump foo.rrd | grep --color -E '<value>[0-9]+|<unknown_sec>|<lastupdate>'
<lastupdate>1551973741</lastupdate> <!-- 2019-03-07 17:49:01 EET -->
<value>2.6877900000e+01</value>
<unknown_sec> 241 </unknown_sec>
usr#PC:~$ rm foo.rrd; rrdtool create foo.rrd --start 'N' --step '300' 'DS:RTT:GAUGE:600:0:1000000' 'RRA:AVERAGE:0.5:1:1440'; rrdtool update foo.rrd N:100; rrdtool dump foo.rrd | grep --color -E '<value>[0-9]+|<unknown_sec>|<lastupdate>'
<lastupdate>1551973742</lastupdate> <!-- 2019-03-07 17:49:02 EET -->
<value>1.7477500000e+01</value>
<unknown_sec> 242 </unknown_sec>
usr#PC:~$
How is this PDP value calculated? My guess is that first time the rrdtool update foo.rrd N:100 happened 268.779ms later than rrdtool create. And second time the rrdtool update foo.rrd N:100 happened 174.775ms later than rrdtool create. Am I correct?
value contains rate*seconds which occurred up to the last run.
Dump implementation here: rrd_dump.c
CB_FMTS("\t\t<value>%0.10e</value>\n",
rrd.pdp_prep[i].scratch[PDP_val].u_val);
Description of pdp_prep[].scratch[PDP_val].u_val you can find here: rrd_update.c#L1689
/* in pdp_prep[].scratch[PDP_val].u_val we have collected
rate*seconds which occurred up to the last run.

Return Calculations Incorrect in Panel Data

I'm currently working with panel data in Stata, and run the following commands to define the panel:
encode ticker, generate(ticker_n)
xtset ticker_n time
Where the ticker is a string (ticker of a listed company on a stock exchange), and time is an integer going from 930 (opening of the market) to 1559 (closing of the market). Thus, time here indicates the minutes the stock exchange is opened. For each minute the stock market is opened we have all close prices of the tickers listed at the stock exchange. A sample of the data looks as such:
date time open high low close volume ticker ticker_n
09/15/2008 930 33.31 33.31 33.31 33.31 2135 zeus zeus
09/15/2008 931 32.94 32.94 32.94 32.94 100 zeus zeus
09/15/2008 930 10.21 10.21 10.21 10.21 4270 bx bx
09/15/2008 931 10.46 10.5 10.42 10.44 5700 bx bx
Then, in an attempt to calculate returns (using the close price) I run the following command:
gen return = (close - l.close) / l.close
However, this leads to a weird error where every whole hour (time = 1100, 1200, 1300, etc.) the returns are not calculated at all and Stata just reports a "-" for the returns.
Now I assume something went wrong in defining the panel data, such that Stata does not recognize that the observation before 1500 should be 1459 (it looks for 1499 I assume?).
Hence, my question is, how do I correctly define my panel data such that Stata recognizes that my time axis is in minutes? I did not find anything in the official Stata documentation that helped me out here.
Indeed: your time variable is messing you up mightily. If time is going from 1059 to 1100, or from 1159 to 1200, each of those is a jump of 41 to Stata. The value for the time previous to 1100 would have been at time 1099, which won't be in your data; hence previous values for 1100, etc., will all be missing. There is no sense whatsoever in which Stata will look at 1100 and say "Oh! that's a time and so the previous time would have been 1059 and I should use the value for 1059". Applying a time display format wouldn't change that failure to see the times as you understand them.
You don't explain how daily dates are supposed to enter your analysis. Here is some technique for times in hours and minutes alone.
clear
input time
930
931
959
1000
1001
1059
1100
end
gen double mytime = dhms(0, floor(time/100), mod(time, 100), 0)
format mytime %tcHH:MM
gen id = 1
xtset id mytime, delta(60000)
list mytime L.mytime, sep(0)
+-----------------+
| L.|
| mytime mytime |
|-----------------|
1. | 09:30 . |
2. | 09:31 09:30 |
3. | 09:59 . |
4. | 10:00 09:59 |
5. | 10:01 10:00 |
6. | 10:59 . |
7. | 11:00 10:59 |
+-----------------+

rrd graph configurate query

I am updating my RRD file with some counts...
For example:
time: value:
12:00 120
12:05 135
12:10 154
12:20 144
12:25 0
12:30 23
13:35 36
here my RRD is updating as below logic:
((current value)-(previous value))/((current time)-(previous time))
eg. ((135-120))/5 = 15
but my problem is when it comes 0 the reading will be negative:
((0-144))/5
Here " 0 " value comes with system failure only( from where the data is fetched)..It must not display this reading graph.
How can I configure like when 0 comes it will not update the "RRD graph" (skip this reading (0-144/5)) and next time it will take reading like ((23-0)/5) but not (23-144/10)
When specifying the data sources when creating the RRD, you can specify which range of values is acceptable.
DS:data_source:GAUGE:10:1:U will only accept values above 1.
So if you get a 0 during an update, rrd will replace it with unknown and i assume it can find a way to discard it.