rrdtool fetch values are note same as entered

rrdtool fetch values are note same as entered - rrdtool

Why values used for updating a rrd are different than the fetches values
I used this for updating: 1353702000:2000
and I got this when I fetch: 1353702000: 1.6666666667e+00
Is there a way to get the number a entered?
Is there a way to format the timestamp and the numbers?
Details:
I created this database:
rrdtool create datafile.rrd DS:packets:ABSOLUTE:900:0:10000000 RRA:AVERAGE:0.5:1:9600 RRA:AVERAGE:0.5:4:9600 RRA:AVERAGE:0.5:24:6000
I updated the database with this timestamp and value:
rrdtool update datafile.rrd 1353702000:2000
I fetch de database with this
rrdtool fetch datafile.rrd AVERAGE -r 90 -s -1h
and I got this
1353700800: nan
1353701100: nan
1353701400: nan
1353701700: 1.6666666667e+00
1353702000: 1.6666666667e+00
1353702300: 3.3333333333e+00
1353702600: 3.3333333333e+00
1353702900: 6.6666666667e+00
1353703200: nan
1353703500: nan
1353703800: nan
1353704100: nan
1353704400: nan
Thanks

use GAUGE as Datastore Type, not ABSOLUTE

The reasons you get these values are twofold.
Firstly, you have type 'ABSOLUTE' for the data, which means that it will be divided by the time since the last update to give a per-second rate. If you want the value stored as-is, use type GAUGE. If the value is constantly increasing - such as with an SNMP network interface packet counter - then use COUNTER to get a rate of change.
Secondly, data normalisation. If the samples are not on the interval boundary (IE, timestamp mod 300 = 0 in this case) they will be adjusted to fit the time. To avoid this, submit the samples with timestamps on the interval boundary.

Related

AWS Forecast cannot train the predictor due to missing data

This question is close, but doesn't quite help me with a similar issue as I am using a single data set and no related time series.
I am using AWS Forecast with a single time series dataset (no related data, just the main DS). It is a daily data set with about 10 years of data ranging from 2010-2020.
I have 3572 data points in the original data set; I manually filled missing data to ensure there were no missing days in the date range for a total of 3739 data points. I lopped off everything in 2020 to create a validation dataset and then configured the predictor for a 180 day Forecast. I keep getting the following error:
Unable to evaluate this dataset because there is missing data in the evaluation window for all items. Ensure that there is complete data for at least one item in the evaluation window starting from 2019-03-07T00:00:00 up to 2020-01-01T00:00.
There is definitely no missing data, I've double and triple checked the date range and data fill and every day between start and end dates has a data point. I also tried adding a data point for 1/1/2020 (it ended at 12/31/2019) and I continue to get this error. I can't figure out what it's asking me for, except that maybe I'm missing something in my math about the forecast Horizon and Backtest window offset?
Dataset example:
Brief model parameters (can share more if I'm missing something pertinent):
Total data points in training data: 3479
forecastHorizon = 180
create_predictor_response=forecast.create_predictor(PredictorName=predictorName,
ForecastHorizon=forecastHorizon,
PerformAutoML= True,
PerformHPO=False,
EvaluationParameters= {"NumberOfBacktestWindows": 1,
"BackTestWindowOffset": 180},
InputDataConfig= {"DatasetGroupArn": datasetGroupArn},
FeaturizationConfig= {"ForecastFrequency": 'D'

I noticed you don't have entry for 6/24/10 (this american date format is the worst btw)
I faced a similar problem when leaving out days (assuming you're modelling in daily frequency) just like that and having the Forecast automatic filling of gaps to nan values (as opposed to zero which is the default). I suggest you:
pre-fill literally every date within the range of training data (and of forecast window, if using related data)
choose zero as the option for automatically filling of missing values. I think mean or any other float value would also work for that matter
let me know if that works! I am also using Forecast and it's good to keep track of possible problems and solutions

what is the best practice for storing date and time in class/object?

recently i'm going to connect to PostgreSQL and I need to store date/time in my object to pass to the query for insert and update some table.
but there is not clear way in c++ to store and retrieve date/time.
any comment?

PostgreSQL TimeFormat 9+ version https://www.postgresql.org/docs/9.1/datatype-datetime.html
Which is a 8byte int(64 bit) time format in microsecond precision, UTC without timezone (from top of the table).
When you create a table you can either time-stamp the record by PostgreSQL current_timestamp , OR insert into table as integer 64bit microsecond format. since PostgreSQL has multiple time format you should decide time any the format you want from table
PostgreSQL approach CREATE,INSERT,RETRIEVE
"CREATE TABLE example_table(update_time_column TIMESTAMP DEFAULT CURRENT_TIMESTAMP)"
"INSERT INTO example_table(update_time_column) VALUES update_time_column=CURRENT_TIMESTAMP"
"SELECT (EXTRACT(epoch FROM update_time_column)*1000000) FROM example_table"
C++ approach
auto/int64_t cppTime = get64bitMicrosecondFormat from some library`
something similar to this answer: Getting an accurate execution time in C++ (micro seconds)
Then push your object / record to PostGRESQL, when retrieve in microseconds, adjust precision /1000 for milliseconds etc.
Just don't forget to synchronize PostgreSQL and C++ timestamp length (eg. 8byte - 8byte each side) otherwise, your time will be thresholded either side, and you will lose precision / get unexpected time.

How to insert a column in BigTable using Hbase Client Put operation with past creation date

I am trying to execute an HBase client Put operation with a calculated timestamp equivalent of the year "Thu Sep 21 12:50:34 EDT 1950" which is a negative value.
Use Case: Setting a past creation date to execute an expected TTL on values
Reference: How to set a future insert date in Google Cloud Bigtable? Trying to calculate it using TTL
Error:
Caused by: java.lang.IllegalArgumentException: Timestamp cannot be negative. ts=-608368165717
Code:
long now = System.currentTimeMillis();
long colFam_TTL = cfDescriptor1.getTimeToLive() * 1000; // TTL: Forever => 2147483647 Seconds
long expiry_past = now - colFam_TTL;
long creationTime = (expiry_past + 3600000); // Expected TTL : 1 hour(3600000 ms)
Put p = new Put(rowkey, creationTime);
The calculation works fine for column families with TTL set as maxage:1h or maxage:5d
Is there a workaround for this? How can I set a similar ts?

I'm not quite clear on what you are trying to accomplish, but as noted by another commenter, the issue is that the timestamps Cloud Bigtable timestamps must be non-negative.
In this case, the problem is that as you noted, as written your creationTime has a negative value and the timestamp value must be a positive value, interpreted as milliseconds from the Unix epoch for any of the time-based GC expressions.
It seems like what you really want is to compute colFam_TTL - now, rather than the opposite, and clamp it at 0.

RRDTool Counter increment lower than time

I create a standard RRDTool database with a default step of 5mn (300s).
I have different types of values in it, some gauges which are easily processed, but I have other values I would have in COUNTER but here is my problem :
I read the data in a program, and get the difference between values over two steps is good but the counter increment less than time (It can increment by less than 300 during a step), so my out value is wrong.
Is it possible to change the COUNTER for not be a number by second but by step or something like that, if it's not I suppose I have to calculate the difference in my program ?
Thank you for helping.

RRDTool is capable of handling fractional values, so there is no problem if the counter increments by less than the seconds interval since the last update.
RRDTool stores everything as a Rate. If your DS is of type GAUGE, then RRDTool assumes that the incoming value is alreayd a rate, and only applies Data Normalisation (more on this later). If the type is COUNTER or DERIVE, then the value/timepoint you are updating with is compared to the previous value/timepoint to obtain a rate thus: r=(x2 - x1)/(t2 - t1). The rate obtained is then Normalised. The other DS type is ABSOLUTE, which assumes the counter was reset on the last read, giving r=x2/(t2 - t1).
The Normalisation step adjusts the data point based on assuming a linear progression from the last data point so that it lies exactly on an interval boundary. For example, if your step is 5min, and you update at 12:06, the data point is adjusted back to what it would have been at 12:05, and stored against 12:05. However the last unadjusted DP is still preserved for use at the next update, so that overall rates are correct.
So, if you have a 300s (5min) interval, and the value increased by 150, the rate stored will be 0.5.
If the value you are graphing is something small, e.g. 'number of pages printed', this might seem counterintuitive, but it works well for large rates such as network traffic counters (which is what RRDTool was designed for).
If you really do not want to display fractional values in the generated graphs or output, then you can use a format string such as %.0f to enforce no decimal places and the displayed number will be rounded to the nearest integer.

RRD graphs in Zenoss showing NaN on large time ranges

I am trying to create COMMAND JSON datasource to monitor some values, for example from such script:
print json.dumps({
'values': {
'': {'random': random()},
},
'events': []
})
And when i just starting zencommand, appropriate rrd file is created, but cur, avg and max values on graph shows me NaN. That NaNs is replaced by actual numbers when I zoom in to a current point in time, which is not very far from start of monitoring.
Why it don't show correct min, max and avg values before I zoom in? Is that somehow related to consolidation? I read http://www.vandenbogaerdt.nl/rrdtool/min-avg-max.php, but that page don't tell anything about NaN values.
And is any way to quicker zoom in to the current timestamp to see some data faster?

When you are zoomed out, you'll be looking at the lower-granularity RRAs (Round Robin Archives). These do not get populated until enough data are in the higher-granularity ones; so, for example, if you have a 5min-granularity RRA, a 1hr-granularity RRA, and a 1day-granularity RRA, and have collected data for the last 45min, then you will see ~8 data points in your 'daily' graph (which uses the 5min RRA), but nothing in your 'monthly' (which will use the 1hr RRA) or your 'yearly' (which uses the 1day RRA).
This applies to any RRA; AVG, LAST, MAX, etc. Until the consolidated time window is complete, and the full complement of Primary Data Points has been collected for consolidation, the consolidated data point value is undefined.
RRDTool picks the RRA to use based on the requested graph data width and pixel width, as well as the requested consolidation functions. Although there are ways to force RRDtool to use a higher-granularity RRA than it needs to, and to consolidate on the fly, this is inefficient and slow. It also makes having the lower-granularity RRA pointless and throws away one of the major benefits of RRDtool (that it performs consolidation at update time making graphing faster)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js