Substitute Scientific Notation from bytes to Megabytes [closed] - rrdtool

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
I Have a .xml file that has lines which look like this:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rrd SYSTEM "http://oss.oetiker.ch/rrdtool/rrdtool.dtd">
<!-- Round Robin Database Dump -->
<rrd>
<version>0003</version>
<step>60</step> <!-- Seconds -->
<lastupdate>1674125860</lastupdate> <!-- 2023-01-19 10:57:40 UTC -->
<ds>
<name> 1 </name>
<type> GAUGE </type>
<minimal_heartbeat>8460</minimal_heartbeat>
<min>NaN</min>
<max>NaN</max>
<!-- PDP Status -->
<last_ds>954298368</last_ds>
<value>3.8171934720e+10</value>
<unknown_sec> 0 </unknown_sec>
</ds>
<!-- Round Robin Archives -->
<rra>
<cf>AVERAGE</cf>
<pdp_per_row>1</pdp_per_row> <!-- 60 seconds -->
<params>
<xff>5.0000000000e-01</xff>
</params>
<cdp_prep>
<ds>
<primary_value>8.5981579947e+08</primary_value>
<secondary_value>0.0000000000e+00</secondary_value>
<value>NaN</value>
<unknown_datapoints>0</unknown_datapoints>
</ds>
</cdp_prep>
<database>
<!-- 2023-01-17 10:58:00 UTC / 1673953080 --> <row><v>NaN</v></row>
<!-- 2023-01-17 10:59:00 UTC / 1673953140 --> <row><v>NaN</v></row>
<!-- 2023-01-17 11:00:00 UTC / 1673953200 --> <row><v>NaN</v></row>
<!-- 2023-01-17 11:01:00 UTC / 1673953260 --> <row><v>NaN</v></row>
<!-- 2023-01-17 11:02:00 UTC / 1673953320 --> <row><v>NaN</v></row>
<!-- 2023-01-17 11:03:00 UTC / 1673953380 --> <row><v>NaN</v></row>
<!-- 2023-01-18 12:00:00 UTC / 1674043200 --> <row><v>NaN</v></row>
<!-- 2023-01-18 18:00:00 UTC / 1674064800 --> <row><v>7.9644330667e+08</v></row>
<!-- 2023-01-19 00:00:00 UTC / 1674086400 --> <row><v>7.9696554667e+08</v></row>
<!-- 2023-01-19 06:00:00 UTC / 1674108000 --> <row><v>5.8408509440e+08</v></row>
</database>
</rra>
Trying to convert the scientific notation (which is a value in bytes) and convert it to a value in megabytes and back to scientific notation in Linux bash shell or script.
So far I have this lines, but i am stuck and don't know how to put them back into the file with the calculation to divide 2x by 1024:
cat Memory_mem_used.xml | grep -Eo '[0-9]+\.[0-9]+e\+[0-9]+' | perl -ne 'printf "%d\n", $_;'
The output should look like this:
output=796443306 | output2=$(($output / 1024 / 1024)) | perl -e 'printf "%.11e\n", '$output2''
7.59000000000e+02

Try:
#!/bin/bash
IFS=''
while read line ; do
left=${line%%<v>*}
rest=${line#*<v>}
value=${rest%%</v>*}
right=${rest#*</v>}
if [ "$value" != "$line" ] && [ "$value" != "NaN" ] ; then # match
num_value=$(LC_ALL=C printf '%.0f' "$value")
new_value=$(LC_ALL=C printf '%.11e' $((num_value / 1048576)) )
line="$left<v>$new_value</v>$right"
fi
echo "$line"
done < input.xml

Related

How to create a rrd file with a specific time?

I had created a rrd file with a specific time. But when i convert it into xml, i find the start time is inconsitent with the specified time.
The version of rrdtool is 1.5.5.
And the code is
> rrdtool create abc.rrd \
> step 15 --start 1554122342 \ DS:sum:GAUGE:120:U:U \ RRA:AVERAGE:0.5:1:5856 \ RRA:AVERAGE:0.5:4:20160 \
> RRA:AVERAGE:0.5:40:52704
The first few lines is like
> <!-- 2019-03-31 20:15:15 CST / 1554034515 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:15:30 CST / 1554034530 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:15:45 CST / 1554034545 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:16:00 CST / 1554034560 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:16:15 CST / 1554034575 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:16:30 CST / 1554034590 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:16:45 CST / 1554034605 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:17:00 CST / 1554034620 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:17:15 CST / 1554034635 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:17:30 CST / 1554034650 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:17:45 CST / 1554034665 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:18:00 CST / 1554034680 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:18:15 CST / 1554034695 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:18:30 CST / 1554034710 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:18:45 CST / 1554034725 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:19:00 CST / 1554034740 --> <row><v>NaN</v></row>
> <!-- 2019-03-31 20:19:15 CST / 1554034755 --> <row><v>NaN</v></row>
I tried other parameters such as the default(now-10s), but the interval is about one day.
(My example below tested with RRDTool 1.5.5)
Your RRA is approximately 1 year long, in 10min intervals; with a 15s set up the RRD.
When you create an RRD, the start time is the time of the most recent data point or last update; in other words, you cannot add any data for a time earlier than this. The RRA will be initialised with unknown throughout.
So, when you create your RRD with:
rrdtool create abc.rrd --step 15 --start 1554122342 \
DS:sum:GAUGE:120:U:U RRA:AVERAGE:0.5:40:52704`
you can see this using rrdtool info (output trimmed for clarity):
$ rrdtool info abc.rrd
filename = "abc.rrd"
...
last_update = 1554122342
When you then use rrdtool dump to immediately view the content of the RRA, you can see that it starts about a year earlier:
$ rrdtool dump abc.rrd
...
<lastupdate>1554122342</lastupdate> <!-- 2019-04-02 01:39:02 NZDT -->
...
<database>
<!-- 2018-04-01 01:40:00 NZDT / 1522500000 --> <row><v>NaN</v></row>
<!-- 2018-04-01 01:50:00 NZDT / 1522500600 --> <row><v>NaN</v></row>
...
<!-- 2019-04-02 01:20:00 NZDT / 1554121200 --> <row><v>NaN</v></row>
<!-- 2019-04-02 01:30:00 NZDT / 1554121800 --> <row><v>NaN</v></row>
</database>
But wait a minute! This ends on 1554121800, but our last update (start time) was 1554122342! This is a difference of 542. Why would this be?
The reason is that although your step is 15s, the RRA interval is 40 steps, IE 600s. The next entry cannot be added until there is 600s of data, and we only have 542. Therefore, the last entry in the RRA is as shown. Note that all intervals are normalised relative to UCT, and so your RRA cdp (consolodated data points) will always be a multiple of the interval size - in this case, 600 - regardless of when you set 'start' to be. RRDTool will simply pick the closest. This behaviour becomes much more obvious when you are rolling up to a large time period - e.g. 1 day - and you live in a more extreme timezone - e.g. Auckland with UCT+13.
Of course, once you write anything to the RRD, then lastupdate will change, and the RRA will add however many new points are required (and drop the old ones of course).

RRD acceptance criteria

Im using an RRD to monitor a data source. We are seeing many occasions where the RRD stores a NaN result despite the fact that we know data was received as we are also appending the received data to a file for testing. When we examine the difference we see the following:
I tried to paste the data as two columns but it hasnt structured properly but in essence what we see below is two columns of a spreadsheet. The left column is the rrd dump and the right column is the actual data that arrived at that time.
" <!-- 2017-09-28 06:00:00 UTC / 1506578400 --> <row><v>1.1999200000e+06</v></row>" 1506578412:1202000
" <!-- 2017-09-28 06:05:00 UTC / 1506578700 --> <row><v>1.2538400000e+06</v></row>" 1506578712:1256000
" <!-- 2017-09-28 06:10:00 UTC / 1506579000 --> <row><v>1.2310400000e+06</v></row>" 1506579012:1230000
" <!-- 2017-09-28 06:15:00 UTC / 1506579300 --> <row><v>1.2415200000e+06</v></row>" 1506579312:1242000
" <!-- 2017-09-28 06:20:00 UTC / 1506579600 --> <row><v>1.2304800000e+06</v></row>" 1506579612:1230000
" <!-- 2017-09-28 06:25:00 UTC / 1506579900 --> <row><v>1.2357600000e+06</v></row>" 1506579912:1236000
" <!-- 2017-09-28 06:30:00 UTC / 1506580200 --> <row><v>1.1284800000e+06</v></row>" 1506580212:1124000
" <!-- 2017-09-28 06:35:00 UTC / 1506580500 --> <row><v>1.2238400000e+06</v></row>" 1506580512:1228000
" <!-- 2017-09-28 06:40:00 UTC / 1506580800 --> <row><v>NaN</v></row>" 1506580813:1222000
" <!-- 2017-09-28 06:45:00 UTC / 1506581100 --> <row><v>1.2400000000e+06</v></row>" 1506581112:1240000
" <!-- 2017-09-28 06:50:00 UTC / 1506581400 --> <row><v>1.2284800000e+06</v></row>" 1506581412:1228000
" <!-- 2017-09-28 06:55:00 UTC / 1506581700 --> <row><v>8.9392000000e+05</v></row>" 1506581712:880000
" <!-- 2017-09-28 07:00:00 UTC / 1506582000 --> <row><v>NaN</v></row>" 1506582014:1000000
" <!-- 2017-09-28 07:05:00 UTC / 1506582300 --> <row><v>NaN</v></row>" 1506582315:738000
" <!-- 2017-09-28 07:10:00 UTC / 1506582600 --> <row><v>1.1760000000e+06</v></row>" 1506582613:1176000
" <!-- 2017-09-28 07:15:00 UTC / 1506582900 --> <row><v>1.1874800000e+06</v></row>" 1506582912:1188000
" <!-- 2017-09-28 07:20:00 UTC / 1506583200 --> <row><v>1.2033600000e+06</v></row>" 1506583212:1204000
" <!-- 2017-09-28 07:25:00 UTC / 1506583500 --> <row><v>1.2097600000e+06</v></row>" 1506583512:1210000
" <!-- 2017-09-28 07:30:00 UTC / 1506583800 --> <row><v>1.0717600000e+06</v></row>" 1506583811:1066000
" <!-- 2017-09-28 07:35:00 UTC / 1506584100 --> <row><v>NaN</v></row>" 1506584112:1222000
" <!-- 2017-09-28 07:40:00 UTC / 1506584400 --> <row><v>1.1760000000e+06</v></row>" 1506584412:1176000
" <!-- 2017-09-28 07:45:00 UTC / 1506584700 --> <row><v>1.2048000000e+06</v></row>" 1506584712:1206000
" <!-- 2017-09-28 07:50:00 UTC / 1506585000 --> <row><v>1.0255200000e+06</v></row>" 1506585012:1018000
" <!-- 2017-09-28 07:55:00 UTC / 1506585300 --> <row><v>1.2004000000e+06</v></row>" 1506585312:1208000
" <!-- 2017-09-28 08:00:00 UTC / 1506585600 --> <row><v>1.1676800000e+06</v></row>" 1506585612:1166000
" <!-- 2017-09-28 08:05:00 UTC / 1506585900 --> <row><v>1.2024800000e+06</v></row>" 1506585912:1204000
" <!-- 2017-09-28 08:10:00 UTC / 1506586200 --> <row><v>1.2116800000e+06</v></row>" 1506586212:1212000
" <!-- 2017-09-28 08:15:00 UTC / 1506586500 --> <row><v>NaN</v></row>" 1506586513:886000
" <!-- 2017-09-28 08:20:00 UTC / 1506586800 --> <row><v>1.1940000000e+06</v></row>" 1506586812:1194000
" <!-- 2017-09-28 08:25:00 UTC / 1506587100 --> <row><v>1.1959200000e+06</v></row>" 1506587112:1196000
" <!-- 2017-09-28 08:30:00 UTC / 1506587400 --> <row><v>NaN</v></row>" 1506587413:1206000
" <!-- 2017-09-28 08:35:00 UTC / 1506587700 --> <row><v>1.1440000000e+06</v></row>" 1506587712:1144000
" <!-- 2017-09-28 08:40:00 UTC / 1506588000 --> <row><v>NaN</v></row>" 1506588013:668000
" <!-- 2017-09-28 08:45:00 UTC / 1506588300 --> <row><v>1.2080000000e+06</v></row>" 1506588312:1208000
" <!-- 2017-09-28 08:50:00 UTC / 1506588600 --> <row><v>NaN</v></row>" 1506588613:1156000
" <!-- 2017-09-28 08:55:00 UTC / 1506588900 --> <row><v>1.2080000000e+06</v></row>" 1506588912:1208000
" <!-- 2017-09-28 09:00:00 UTC / 1506589200 --> <row><v>1.1945600000e+06</v></row>" 1506589212:1194000
" <!-- 2017-09-28 09:05:00 UTC / 1506589500 --> <row><v>1.1786400000e+06</v></row>" 1506589512:1178000
" <!-- 2017-09-28 09:10:00 UTC / 1506589800 --> <row><v>1.1396000000e+06</v></row>" 1506589811:1138000
" <!-- 2017-09-28 09:15:00 UTC / 1506590100 --> <row><v>NaN</v></row>" 1506590113:1006000
" <!-- 2017-09-28 09:20:00 UTC / 1506590400 --> <row><v>1.1780000000e+06</v></row>" 1506590412:1178000
" <!-- 2017-09-28 09:25:00 UTC / 1506590700 --> <row><v>1.1799200000e+06</v></row>" 1506590712:1180000
" <!-- 2017-09-28 09:30:00 UTC / 1506591000 --> <row><v>1.1953600000e+06</v></row>" 1506591012:1196000
" <!-- 2017-09-28 09:35:00 UTC / 1506591300 --> <row><v>1.1806400000e+06</v></row>" 1506591312:1180000
" <!-- 2017-09-28 09:40:00 UTC / 1506591600 --> <row><v>1.1588800000e+06</v></row>" 1506591612:1158000
" <!-- 2017-09-28 09:45:00 UTC / 1506591900 --> <row><v>1.2002400000e+06</v></row>" 1506591912:1202000
" <!-- 2017-09-28 09:50:00 UTC / 1506592200 --> <row><v>1.0656800000e+06</v></row>" 1506592212:1060000
" <!-- 2017-09-28 09:55:00 UTC / 1506592500 --> <row><v>1.2078400000e+06</v></row>" 1506592512:1214000
" <!-- 2017-09-28 10:00:00 UTC / 1506592800 --> <row><v>1.1640800000e+06</v></row>" 1506592812:1162000
" <!-- 2017-09-28 10:05:00 UTC / 1506593100 --> <row><v>1.1754400000e+06</v></row>" 1506593112:1176000
We can see the occasions where the data seems not to be accepted are almost always when the time it arrives is somewhat outside the trend.
How can we go about widening the acceptance criteria so that all of these datapoints are accepted?
RRD info for the RRD in question is shown below:
root#ra:/var/www/genie/public_html# rrdtool info /an/data/SI1.rrd
filename = "/an/data/SI1.rrd"
rrd_version = "0003"
step = 300
last_update = 1506594312
header_size = 1000
ds[probe1-temp].index = 0
ds[probe1-temp].type = "GAUGE"
ds[probe1-temp].minimal_heartbeat = 300
ds[probe1-temp].min = 0.0000000000e+00
ds[probe1-temp].max = 5.0000000000e+06
ds[probe1-temp].last_ds = "1226000"
ds[probe1-temp].value = NaN
ds[probe1-temp].unknown_sec = 12
rra[0].cf = "MIN"
rra[0].rows = 1440
rra[0].cur_row = 238
rra[0].pdp_per_row = 12
rra[0].xff = 5.0000000000e-01
rra[0].cdp_prep[0].value = 1.1754400000e+06
rra[0].cdp_prep[0].unknown_datapoints = 2
rra[1].cf = "MAX"
rra[1].rows = 1440
rra[1].cur_row = 1220
rra[1].pdp_per_row = 12
rra[1].xff = 5.0000000000e-01
rra[1].cdp_prep[0].value = 1.2140000000e+06
rra[1].cdp_prep[0].unknown_datapoints = 2
rra[2].cf = "AVERAGE"
rra[2].rows = 1440
rra[2].cur_row = 1205
rra[2].pdp_per_row = 1
rra[2].xff = 5.0000000000e-01
rra[2].cdp_prep[0].value = NaN
rra[2].cdp_prep[0].unknown_datapoints = 0
root#ra:#
You have set the DS heartbeat to 300, but also the step to 300.
This means that, if your data arrive 300s or more apart, then they will be stored as NaN, which is what you are seeing. From the stats you give, you can see that on the NaN rows, the actual time interval is 301 or 302 sec, which is >300 and so results in a NaN as it exceeds the heartbeat time.
You should normally set the heartbeat to twice the expected data interval, IE twice the step, so as to handle this case.
Try setting the heartbeat to 600; this should solve the problem.

How to understand Primary Data Point(PDP) in rrdtool database?

If I dump an RRD to XML, then under "PDP Status" section there are three elements: <last_ds>, <value> and <unknown_sec>. For example:
<!-- PDP Status -->
<last_ds>90</last_ds>
<value>4.2177496500e+03</value>
<unknown_sec> 184 </unknown_sec>
Now as I understand, then each time I execute "rrd update", I will update Primary Data Point (PDP). Looks like whatever I put as a value for rrdtool update(for example rrdtool update test.rrd "N:abc"), then it is shown as a value for <last_ds> element. However, how is the number for <value> calculated? I mean the number 4217.7496500 in example above. Is this some kind of average? Last but not least, while I understand that <unknown_sec> shows the number of seconds when value of the DS has been unknown, then this counter seems to wrap around 280 - 295 seconds. How to explain this? I mean for example if I execute while true; do rrdtool update test.rrd "N:75"; rrdtool dump test.rrd | grep "<unknown_sec>"; sleep 1; done where 75 is lower than the lowest value allowed for this DS, then output is following:
/* data not shown for brevity */
<unknown_sec> 280 </unknown_sec>
<unknown_sec> 281 </unknown_sec>
<unknown_sec> 282 </unknown_sec>
<unknown_sec> 0 </unknown_sec>
<unknown_sec> 1 </unknown_sec>
<unknown_sec> 2 </unknown_sec>
/* data not shown for brevity */
the PDP content of <value> is the sum of all products of input value multiplied by duration this value was valid for. In order to build the PDP, at the end of the interval, this value is divided by the duration of the interval minus the number of unknown seconds ... The number of unknown seconds resets to 0 when a new interval is started ...

Converting epoch to date & time with time zone in xslt 2.0

How do I convert epoch time to date & time based on time zone in xslt 2.0 ?
For example, epoch time 1212497304 converts to
GMT: Tue, 03 Jun 2008 12:48:24 GMT
time zone: martes, 03 de junio de 2008 14:48:24 GMT+2
This is because of the Daylight Saving Time (DST): in many countries there is one hour more in summer, during some dates that varies each year.
For example, it is supposed that this instruction:
<xsl:value-of select=" format-dateTime(xs:dateTime('2013-07-07T23:08:00+00:00'), '[D] [MNn] [Y] [h]:[m01][PN,*-2] [Z] ([C])', 'en', 'AD', 'IST') "/>
would calculate the given GMT date event into Indian Standard Time (IST) with Gregorian Calendar (AD) but it just prints:
7 July 2013 11:08PM +00:00 (Gregorian)
So it does not shift the time zone.
To shift the time zone we must use:
adjust-dateTime-to-timezone
But this function accepts only a duration in number of hours/minutes, not a TimeZone so that the processor determines if there is DST or not.
Any advise, please
Edit: This is not really the answer to the actual question asked, which was how to get the correct local time, respecting its timezone, but I'll still leave it here since it might be usefull to someone.
Because epoch is just seconds past since unix time you can just add it to unix time like this:
Unix time is the number of seconds elapsed since the epoch of midnight 1970-01-01, you can do:
<xsl:value-of select="xs:dateTime('1970-01-01T00:00:00') + xs:dayTimeDuration('PT1212497304S')"
/>
This will give you the correct xs:dateTime of 2008-06-03T12:48:24
Put into a function:
<xsl:function name="fn:epochToDate">
<xsl:param name="epoch"/>
<xsl:variable name="dayTimeDuration" select="concat('PT',$epoch,'S')"/>
<xsl:value-of select="xs:dateTime('1970-01-01T00:00:00') + xs:dayTimeDuration($dayTimeDuration)"/>
</xsl:function>

how can I prune a rrd file by date

Hello is there a way to prune a rrd file by date? It seems posible as rrdtool dump file dumps
<!-- 2012-05-07 19:15:00 UTC / 1336418100 --> <row><v> 0.0000000000e+00 </v></row>
<!-- 2012-05-07 19:20:00 UTC / 1336418400 --> <row><v> 9.6589767000e-01 </v></row>
<!-- 2012-05-07 19:25:00 UTC / 1336418700 --> <row><v> 3.4568563333e-02 </v></row>
<!-- 2012-05-07 19:30:00 UTC / 1336419000 --> <row><v> 9.6402870667e-01 </v></row>
Thanks
you can edit the dump file prior to restore ... not sure what you mean by pruning, since rrdfiles always stay the same size.