SAS group rows based on time intervals

SAS group rows based on time intervals - sas

I am doing some analysis on a dataset with a variable named "time". The time is in the format of HH:MM:SS.
However, now I want to group the rows based on 5 secs and 1 min time intervals respectively in two different analysis.
I checked some StackOverflow posts online, but it seems that they used a variable called "Interval" which starts at 1 and increases when the interval ends.
data _quotes_interval;
set _quotes_processed;
interval = intck('MINUTE1',"09:30:00"t,time)+1;
run;
What I want to do is to keep the time format. For example, if the original time is 9:00:30, and I am doing the 1 min interval, I want to change the time to 9:00:00 instead.
Is there a way to do this?

SAS stores time as the number of seconds. Since you want to round to the nearest minute or 5 minute, a translation would be, 5*60 = 300 & 60 seconds. The SAS round function supports this.
time_nearest_minute = round(time, 60);
time_nearest_minute5 = round(time, 300);
Edit based on comment:
Time_nearest_second5 = round(time, 5);

For the processing by minute, you can use
data _quotes_interval / view = _quotes_interval;
set _quotes_processed;
interval = intnx('MINUTE', time, 0, 'begin');
run;
intnx is by default used to add or substract time intervals. Here I add 0 binutes, but that is just because I need the function for its fourth parameter, which specifies to go back to the 'begin' of the interval of 1 minute.
PS: For preformance reasons, I would use the view= option, to create a view on the existing data instead of copying all data.
For processing by 5 second intervals
try interval = intnx('SECOND5', time, 0, 'begin');
Disclaimer:
I do not have SAS on this computer. If it does not work, react in a comment and I will test it at work.

Related

RRDTool data values (e.g. max value) are different in different time resolutions

currently I'm experimenting a bit with RRDTool. I'm aware that the accuracy gets lower the longer the time periods are selected. But I thought I could bypass this with my datasource settings.
For example temperature and humidity from my house, resoultion 1h:
And now with the resolution of 1d:
As you could see, there is a great difference for the max. value of the blue line.
I created my datasources and archives with this values:
"rrdtool create temp.rrd --step 30",
"DS:temp:GAUGE:60:U:U",
"DS:humidity:GAUGE:60:U:U",
"RRA:AVERAGE:0.5:1:1051200",
"RRA:MAX:0.5:1:1051200",
"RRA:MIN:0.5:1:1051200",
I thought that 1051200 (1 year = 31536000 / 30 s (resoulution) = 1051200) is correct for saving every value for a year and that there should be no need for interpolating.
Is it possible to get the exact values displayed even if the resolution changes (for example the max humidity (Luftfeuchtigkeit) at 99.9%)?
Here are my values for image creation:
"--start" => "-1h", (-1d etc-)
"--title" => "Haustemperatur",
"--vertical-label" => "°C / % RLF",
"--width" => 800,
"--height" => 600,
"--lower-limit" => "-5",
"DEF:temperatur=$rrdFile:temperatur:LAST",
"DEF:humidity=$rrdFile:humidity:LAST",
"LINE1:temperatur#33CC33:Temperatur",
"GPRINT:temperatur:LAST:\t\tAktuell\: %4.2lf °C",
"GPRINT:temperatur:AVERAGE:Schnitt\: %4.2lf °C",
"GPRINT:temperatur:MAX:Maximum\: %4.2lf °C\j",
"LINE1:humidity#0000FF:Relative Luftfeuchtigkeit",
"GPRINT:humidity:LAST:Aktuell\: %4.2lf %%",
"GPRINT:humidity:AVERAGE:Schnitt\: %4.2lf %%",
"GPRINT:humidity:MAX:Maximum\: %4.2lf %%\j",
Thanks for your help and any suggestions.
P.S. I'm using a library to generate the graphs and the database, please do not be surprised about possible syntax errors.

Your problem is that you are causing the values to be rolled-up on the fly at graph time, but have not correctly specified which rollup function to use. Your second graph is showing the MAXIMUM of the LAST in the interval, not the true Maximum.
There are a few issues to explain with this configuration:
Firstly, your RRD is defined using 3 RRAs with 1cdp=1pdp and different consolidation functions (AVG, MIN, MAX). This means they are functionally identical, but they do not save you any time at graphing as they have not done any pre-rollup for you! You should definitely consider having just one of these (probably AVG) and adding others at lower resolution to help speed up graphing when you have a bigger time window.
Secondly, you need to specify the on-the-fly rollup function. When graphing, RRDTool will work out the best RRA to use based on your DEF lines, and will perform any additional consolidation required on the fly. This can take a long time if the only available RRA is too high-granularity.
Your graph request uses DEF:temperatur=$rrdFile:temperatur:LAST but you do not actually have a LAST type RRA, so RRDTool will grab the last average. Your RRA data points are at 30s interval, but your second graph has (approx) 5min per pixel, meaning that RRDTool needs to grab the 10 entries from the RRA, and print the last. Looking at the data in the top graph, it seems that the last in that interval was the 66 value, though previous ones were 100.
So you have a choice. Do you want the graph to show the average for the time period, the maximum, or both? Do you want the figures at the bottom to show the maximum of the average, or the maximum of everything?
For example
"DEF:temperatur=$rrdFile:temperatur:AVERAGE",
"DEF:humidity=$rrdFile:humidity:AVERAGE",
"DEF:temperaturmax=$rrdFile:temperatur:MAX;reduce=MAX",
"DEF:humiditymax=$rrdFile:humidity:MAX;reduce=MAX",
"LINE1:temperatur#33CC33:Temperatur",
"LINE1:temperaturmax#66EE66:Maximum Temperatur",
"GPRINT:temperatur:LAST:\t\tAktuell\: %4.2lf °C",
"GPRINT:temperatur:AVERAGE:Schnitt\: %4.2lf °C",
"GPRINT:temperaturmax:MAX:Maximum\: %4.2lf °C\j",
"LINE1:humidity#0000FF:Relative Luftfeuchtigkeit",
"LINE1:humiditymax#3333FF:Maximum Luftfeuchtigkeit",
"GPRINT:humidity:LAST:Aktuell\: %4.2lf %%",
"GPRINT:humidity:AVERAGE:Schnitt\: %4.2lf %%",
"GPRINT:humiditymax:MAX:Maximum\: %4.2lf %%\j",
In this case, we define a separate DEF for the maximum data set, so that we can always obtain the highest value even after consolidation. This is also used in the GPRINT so that we get the MAX of the MAX rather than the MAX of the AVERAGE. The Maximum line is now drawn separately to the average line, so that we can see the effect of any rollup of data - the lines will be together at high-resolution but get further apart as the time window widens and resolution decreases.
TheDEF is set to force any rollup function used for the maxima to be MAX rather than AVG, so we can be sure to get the maximum rather than average of maxima.
We are also using AVERAGE rather than LAST in order to get more meaningful data after rollup. Note that we could also use a separate DEF for the LAST as well if we wanted to though it is of less usefulness.
Note that, if you ever expect to be generating graphs over more than a few days, you should definitely consider adding some lower-resolution RRAs for AVERAGE and MAX or else the graphs will generate very slowly. RRDTool is designed with the intention that data will be rolled up over time, rather than (as in a traditional database) every sample kept as-is. So, unless you really need to have 30s resolution data kept for an entire year, you may prefer to keep this high resolution data for only a week, and then have separate RRAs that roll up to 1 hour resolution and keep for longer. Many people keep the 30s for 2 days, then 30min-summary for 2 weeks, 2h-summary for 2 months, and then 1day-summary for 2 years.
For more information, see the RRDTool manual pages.

AWS IoT Analytics Delta Window

I am having real problems getting the AWS IoT Analytics Delta Window (docs) to work.
I am trying to set it up so that every day a query is run to get the last 1 hour of data only. According to the docs the schedule feature can be used to run the query using a cron expression (in my case every hour) and the delta window should restrict my query to only include records that are in the specified time window (in my case the last hour).
The SQL query I am running is simply SELECT * FROM dev_iot_analytics_datastore and if I don't include any delta window I get the records as expected. Unfortunately when I include a delta expression I get nothing (ever). I left the data accumulating for about 10 days now so there are a couple of million records in the database. Given that I was unsure what the optimal format would be I have included the following temporal fields in the entries:
datetime : 2019-05-15T01:29:26.509
(A string formatted using ISO Local Date Time)
timestamp_sec : 1557883766
(A unix epoch expressed in seconds)
timestamp_milli : 1557883766509
(A unix epoch expressed in milliseconds)
There is also a value automatically added by AWS called __dt which is a uses the same format as my datetime except it seems to be accurate to within 1 day. i.e. All values entered within a given day have the same value (e.g. 2019-05-15 00:00:00.00)
I have tried a range of expressions (including the suggested AWS expression) from both standard SQL and Presto as I'm not sure which one is being used for this query. I know they use a subset of Presto for the analytics so it makes sense that they would use it for the delta but the docs simply say '... any valid SQL expression'.
Expressions I have tried so far with no luck:
from_unixtime(timestamp_sec)
from_unixtime(timestamp_milli)
cast(from_unixtime(unixtime_sec) as date)
cast(from_unixtime(unixtime_milli) as date)
date_format(from_unixtime(timestamp_sec), '%Y-%m-%dT%h:%i:%s')
date_format(from_unixtime(timestamp_milli), '%Y-%m-%dT%h:%i:%s')
from_iso8601_timestamp(datetime)

What are the offset and time expression parameters that you are using?
Since delta windows are effectively filters inserted into your SQL, you can troubleshoot them by manually inserting the filter expression into your data set's query.
Namely, applying a delta window filter with -3 minute (negative) offset and 'from_unixtime(my_timestamp)' time expression to a 'SELECT my_field FROM my_datastore' query translates to an equivalent query:
SELECT my_field FROM
(SELECT * FROM "my_datastore" WHERE
(__dt between date_trunc('day', iota_latest_succeeded_schedule_time() - interval '1' day)
and date_trunc('day', iota_current_schedule_time() + interval '1' day)) AND
iota_latest_succeeded_schedule_time() - interval '3' minute < from_unixtime(my_timestamp) AND
from_unixtime(my_timestamp) <= iota_current_schedule_time() - interval '3' minute)
Try using a similar query (with no delta time filter) with correct values for offset and time expression and see what you get, The (_dt between ...) is just an optimization for limiting the scanned partitions. You can remove it for the purposes of troubleshooting.

Please try the following:
Set query to SELECT * FROM dev_iot_analytics_datastore
Data selection filter:
Data selection window: Delta time
Offset: -1 Hours
Timestamp expression: from_unixtime(timestamp_sec)
Wait for dataset content to run for a bit, say 15 minutes or more.
Check contents

After several weeks of testing and trying all the suggestions in this post along with many more it appears that the extremely technical answer was to 'switch off and back on'. I deleted the whole analytics stack and rebuild everything with different names and it now seems to now be working!
Its important that even though I have flagged this as the correct answer due to the actual resolution. Both the answers provided by #Populus and #Roger are correct had my deployment being functioning as expected.

I found by chance that changing SELECT * FROM datastore to SELECT id1, id2, ... FROM datastore solved the problem.

AWS CloudWatch metric math with a cumulative metric's value 30 minutes ago to show rate of change

I have a AWS CloudWatch custom metric that represents a cumulative value which continues to increase overtime. I will add that metric to a dashboard, but I also want to show the rate of change of this metric over the last 30 minutes. Ideally I would like a function to return the metric's value from 30 minutes ago and subtract that from the current value. The "Rate()" function does not seem to help.
I could submit the metrics value a second time with a timestamp that is 30 minutes in the future and subtract these two metrics, but I am hoping for a solution that uses metric math and does not force me to submit another metric. I can think of other use cases where I might want to do math with metrics from different time periods.
Hope I am just missing something here!

You can use some arithmetic to obtain the previous value and then you're able to calculate the percentage of change as you want.
The value you want is: (value_now - value_before) / value_before
Breaking this into 2 parts:
Obtain value_now - value_before. This is the absolute delta of the values.
Obtain value_before. This is the value of the metric in the last datapoint.
Assuming that your metric in Cloudwatch is m.
Step 1: The absolute delta
The absolute_delta can be obtained with: absolute_delta = RATE(m) * PERIOD(m).
Step 2: The previous value
With some arithmetic it is possible to obtain previous_value. Given the definition of absolute delta:
absolute_delta = value_now - value_before
Since we have value_now = m and absolute_delta, then it's a matter of inverting the equation:
value_before = value_now - absolute_delta
Final equation
Just plug everything together and you have your final metric:
change_percentage = 100 * absolute_delta / value_before
In CloudWatch terms:

Metric math function RATE() calculates the rate of change per second.
Returns the rate of change of the metric, per second. This is calculated as the difference between the latest data point value and the previous data point value, divided by the time difference in seconds between the two values.
From https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html
So to get the rate of change for your period you could do this:
RATE(m1)*PERIOD(m1)
and set the period of the dashboard to the wanted value.
Problem in your case is that you need it for a period of 30 min, I don't think you can set 30 min as period on the CloudWatch dashboard. Closest values would be 15 min or 1 hour.

Checking the time in ORACLE APEX 5.1

I'm am new to apex and I'm working on a food ordering application where customers are permitted to change their order details only up to 15 minutes after the order has been placed. How can I implement that ?

Create a validation on date item. Calculate difference between SYSDATE (i.e. "now") and order date. Subtracting two DATE datatype values results in number of days, so multiply it by 24 (to get hours) and by 60 (to get minutes). If that result is more than 15, raise an error.

To provide an alternative to Littlefoot's answer, timestamp arithmetic returns interval literals, if you use SYSTIMESTAMP instead your query could be:
systimestamp - order_date < interval '15' minute
or, even using SYSDATE something like:
order_date > sysdate - interval '15' minute
One note, the 15 minutes seems somewhat arbitrary (a magic number) it relies on the order not starting to be processed within that time limit. It feels more natural to say something like "you can change your order until the kitchen has started cooking it". There's no need for any magic numbers then and considerably less wastage (either of the customers time always waiting 15 minutes or of the kitchen's resources cooking something they may then have to discard).

How to record total values with rrdtool

I'm pretty sure this question has been asked several times, but either I did not find the correct answer or I didn't understand the solution.
To my current problem:
I have a sensor which measures the time a motor is running.
The sensor is reset after reading.
I'm not interested in the time the motor was running the last five minutes.
I'm more interested in how long the motor was running from the very beginning (or from the last reset).
When storing the values in an rrd, depending on the aggregate function, several values are recorded.
When working with GAUGE, the value read is 3000 (10th seconds) every five minutes.
When working with ABSOLUTE, the value is 10 every five minutes.
But what I would like to get is something like:
3000 after the first 5 minutes
6000 after the next 5 minutes (last value + 3000)
9000 after another 5 minutes (last value + 3000)
The accuracy of the older values (and slopes) is not so important, but the last value should reflect the time in seconds since the beginning as accurate as possible.
Is there a way to accomplish this?

I dont know if it is useful for ur need or not but maybe using TREND/TRENDNAN CDEF function is what u want, look at here:
TREND CDEF function

I now created a small SQLite database with one table and one column in that tabe.
The table has one row. I update that row every time my cron job runs and add the current value to the current value. So the current value of the one row and column is the cumualted value of my sensor. This is then fed into the rrd.
Any other (better) ideas?

The way that I'd tackle this (in Linux) is to write the value to a plain-text file and then use the value from that file for the RRDTool graph. I think that maybe using SQLite (or any other SQL server) just to keep track of this would be unnecessarily hard on a system just to keep track of something like this.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js