I have some data expressed in minutes. I retrieve the average value using this formula:
=QUERY(A3:D1074,"select C, avg (D) group by C")
An example value I receive is 351.333333333333 (minutes).
Then I convert it to hours using this formula:
=QUOTIENT(N3,60) & ":" & IF(LEN(MOD(N3,60))=1,0,"") & MOD(N3,60)
The result i get is 5:51.3333333333333 (hours:minutes)
Now I'm unable to make a graph using the values expressed in hours. I see invalid type error when I enter the data to be used in graph.
How can I use convert the time values from minutes to hours and still be able to use it in a graph?
The data you get after converting to hours is text, because you concatenate numbers (hours, minutes) with text (:). This can be easily checked by using the TYPE() formula on the result of your conversion: it returns 2, meaning it is text data (1 would be numerical data).
I would suggest simply dividing the data in minutes by 60 and use that in the graph, it should work as it will be numerical data.
try:
=TEXT(E2/24/60/60; "mm:ss")
or directly:
=INDEX(TEXT(QUERY(A3:D, "select avg(D) group by C label avg(D)''")/24/60/60, "[mm]:ss"))
UPDATE:
Related
I have a google sheet where I'm getting the duration of a Youtube video as follows:
=REGEXEXTRACT(IMPORTXML(A2,"//*[#itemprop='duration']/#content"),"PT(\d+)M(\d+)S")
This gives me two cells with two values (minutes and seconds). However, I want to perform further calculations on them (multiply the minutes by 60 and add the seconds). How can I 'access' these values within a function, if at all?
You want to retrieve the duration time as the unit of the second.
You want to achieve this using the built-in formulas of Spreadsheet.
If my understanding is correct, how about these sample formulas?
Sample formula:
=VALUE(REGEXREPLACE(IMPORTXML(A2,"//*[#itemprop='duration']/#content"),"PT(\d+)M(\d+)S","00:$1:$2")*24*3600)
In this sample formula, the cell "A2" has the URL like https://www.youtube.com/watch?v=###.
The retrieved duration time is converted to the time format, and the value is retrieved as the second.
For example, when IMPORTXML(A2,"//*[#itemprop='duration']/#content") returns PT1M10S, VALUE(REGEXREPLACE("PT1M10S","PT(\d+)M(\d+)S","00:$1:$2")*24*3600) returns 70.
Even when the time is more than 1 hour, for example, the value like PT123M45S is returned. And =VALUE(REGEXREPLACE("PT123M45S","PT(\d+)M(\d+)S","00:$1:$2")*24*3600) returns 7425.
References:
REGEXREPLACE
VALUE
If I misunderstood your question and this was not the result you want, I apologize.
Added:
As other pattern, if you want to use =REGEXEXTRACT(IMPORTXML(A2,"//*[#itemprop='duration']/#content"),"PT(\d+)M(\d+)S"), how about the following formula?
Sample formula:
=QUERY(ARRAYFORMULA(VALUE(REGEXEXTRACT(IMPORTXML(A2,"//*[#itemprop='duration']/#content"),"PT(\d+)M(\d+)S"))),"SELECT Col1*60+Col2 label Col1*60+Col2 ''")
In this formula, values from the array are used and calculated.
or like this:
=TEXT(VALUE("00:"&SUBSTITUTE(REGEXREPLACE(
IMPORTXML(A1, "//*[#itemprop='duration']/#content"), "PT|S", ), "M",":")), "[ss]")*1
or shortest:
=REGEXREPLACE(IMPORTXML(A1,"//*[#itemprop='duration']/#content"),
"PT(\d+)M(\d+)S", "00:$1:$2")*86400
I've written some code in Siddhi that logs/prints the average of a batch of the last 100 events. So the average for event 0-100, 101-200, etc. I now want to compare these averages with each other to find some kind of trend. In first place I just want to see if there is some simple downward of upward trend for a certain amount of averages. For example I want to compare all average values with all upcoming 1-10 average values.
I've looked into Siddhi documentation but I did not find the answer that I wanted. I tried some solutions with partitioning, but this did not work. The below code is what I have right now.
define stream HBStream(ID int, DateTime String, Result double);
#info(name = 'Average100Query')
from HBStream#window.lengthBatch(100)
select ID, DateTime, Result, avg(Result)
insert into OutputStream;
Siddhi sequences can be used to match the averages and to identify a trend, https://siddhi.io/en/v5.1/docs/query-guide/#sequence
from every e1=HBStream, e2=HBStream[e2.avgResult > e1.avgResult], e3=HBStream[e3.avgResult > e2.avgResult]
select e1.ID, e3.avgResult - e1.avgResult as tempDiff
insert into TempDiffStream;
Please note you have to use partition to decide this patter per ID is you need averages to be calculated per Sensor. In your app, also use group by if you need average per sensor
#info(name = 'Average100Query')
from HBStream#window.lengthBatch(100)
select ID, DateTime, Result, avg(Result) as avgResult
group by ID
insert into OutputStream;
I am having real problems getting the AWS IoT Analytics Delta Window (docs) to work.
I am trying to set it up so that every day a query is run to get the last 1 hour of data only. According to the docs the schedule feature can be used to run the query using a cron expression (in my case every hour) and the delta window should restrict my query to only include records that are in the specified time window (in my case the last hour).
The SQL query I am running is simply SELECT * FROM dev_iot_analytics_datastore and if I don't include any delta window I get the records as expected. Unfortunately when I include a delta expression I get nothing (ever). I left the data accumulating for about 10 days now so there are a couple of million records in the database. Given that I was unsure what the optimal format would be I have included the following temporal fields in the entries:
datetime : 2019-05-15T01:29:26.509
(A string formatted using ISO Local Date Time)
timestamp_sec : 1557883766
(A unix epoch expressed in seconds)
timestamp_milli : 1557883766509
(A unix epoch expressed in milliseconds)
There is also a value automatically added by AWS called __dt which is a uses the same format as my datetime except it seems to be accurate to within 1 day. i.e. All values entered within a given day have the same value (e.g. 2019-05-15 00:00:00.00)
I have tried a range of expressions (including the suggested AWS expression) from both standard SQL and Presto as I'm not sure which one is being used for this query. I know they use a subset of Presto for the analytics so it makes sense that they would use it for the delta but the docs simply say '... any valid SQL expression'.
Expressions I have tried so far with no luck:
from_unixtime(timestamp_sec)
from_unixtime(timestamp_milli)
cast(from_unixtime(unixtime_sec) as date)
cast(from_unixtime(unixtime_milli) as date)
date_format(from_unixtime(timestamp_sec), '%Y-%m-%dT%h:%i:%s')
date_format(from_unixtime(timestamp_milli), '%Y-%m-%dT%h:%i:%s')
from_iso8601_timestamp(datetime)
What are the offset and time expression parameters that you are using?
Since delta windows are effectively filters inserted into your SQL, you can troubleshoot them by manually inserting the filter expression into your data set's query.
Namely, applying a delta window filter with -3 minute (negative) offset and 'from_unixtime(my_timestamp)' time expression to a 'SELECT my_field FROM my_datastore' query translates to an equivalent query:
SELECT my_field FROM
(SELECT * FROM "my_datastore" WHERE
(__dt between date_trunc('day', iota_latest_succeeded_schedule_time() - interval '1' day)
and date_trunc('day', iota_current_schedule_time() + interval '1' day)) AND
iota_latest_succeeded_schedule_time() - interval '3' minute < from_unixtime(my_timestamp) AND
from_unixtime(my_timestamp) <= iota_current_schedule_time() - interval '3' minute)
Try using a similar query (with no delta time filter) with correct values for offset and time expression and see what you get, The (_dt between ...) is just an optimization for limiting the scanned partitions. You can remove it for the purposes of troubleshooting.
Please try the following:
Set query to SELECT * FROM dev_iot_analytics_datastore
Data selection filter:
Data selection window: Delta time
Offset: -1 Hours
Timestamp expression: from_unixtime(timestamp_sec)
Wait for dataset content to run for a bit, say 15 minutes or more.
Check contents
After several weeks of testing and trying all the suggestions in this post along with many more it appears that the extremely technical answer was to 'switch off and back on'. I deleted the whole analytics stack and rebuild everything with different names and it now seems to now be working!
Its important that even though I have flagged this as the correct answer due to the actual resolution. Both the answers provided by #Populus and #Roger are correct had my deployment being functioning as expected.
I found by chance that changing SELECT * FROM datastore to SELECT id1, id2, ... FROM datastore solved the problem.
AWS Cloudwatch receives a count of 1 every time I start an image download. I am downloading 1,000s of images (on a cluster of EC2 instances) and would like to track the total progress.
I can't find any documentation on how to plot the cumulative sum of a metric. The AWS Cloudwatch Math Expressions looked promising, but they do not have an integrate function.
Currently, I can plot the sum of the started image downloads but only for periods, as seen below. Ideally, I'd like to plot the integral of this plot:
You can get a cumulative sum over the current range by using the SUM() function that is operated over the original range containing only the number One (1). Remember, you're looking for a single number in the end, so it's not much of a graph, but you need to turn the single value sum back into a time-series.
Define m1 as your metric. This is the metric you will want to use SUM() on.
Define an expression e1 as m1/m1. This results in a time-series with every value equal to 1. This is what will allow you convert that SUM back to a time-series.
Define an expression e2 as SUM(m1) / e1. This is, effectively, the cumulative sum of m1 divided by one for every data-point in the original time-series. It will be a horizontal line on the graph, which will have every point on that horizontal line being the cumulative sum of metric m1. This is required because Cloudwatch can only plot a time-series on the chart, not a single value.
Make m1 and e1 invisible. You need them, but you don't need to see them.
Finally, change the chart type from Line to Number, since you only wanted the cumulative sum anyway.
The reason you can't use SUM() directly is because it is a single value. By dividing by a time-series containing all 1's, the entire graph is the result of the SUM(). Then, changing the chart to a Number effectively hides all the math and presents only the "final result".
Looks like RUNNING_SUM() has been added that does what your need:
Graph with RUNNING_SUM
You can find RUNNING_SUM() under "Add math"->"All functions"
You are correct. All Amazon CloudWatch metrics are for a defined period.
The maximum period for a metric is one day, so this is not suitable for a cumulative counter that you wish to continue beyond one day.
You would need to find an alternate method of storing the count, such as an Amazon DynamoDB table. Use an atomic counter via UpdateItem to increment the count.
You can also use a very long period.
Change your stat to SUM, and set your metric's period to 7 days. You'll get a time series of 1 point with the cumulative sum of all the downloads.
If you give each download a unique dimension value, you can keep your queries separate.
I have a dataset that I am trying to manipulate in GraphLab. I want to convert a UNIX Epoch timestamp from the input file (converted to an SFrame) into a human readable format so I can do analysis based on hour of day and day of week.
time_array is the column/feature of the SFrame sf representing the timestamp, I have broken out just the EPOCH time to simplify things. I know how to convert the time of one row, but I want a vector operation. Here is what I have for one row.
time_array = sf['timestamp']
datetime.datetime.fromtimestamp(time_array[0]).strftime('%Y-%m-%d %H')
You can also get parts of the time from the timestamp to create another column, by which to filter (e.g., get the hour):
sf['hour'] = [x.strftime('%H')for x in sf['timestamp']]
So after staring at this for awhile and then posting the question it came to me, hopefully someone else can benefit as well. Use the .apply method with the datetime.datetime() function
sf['date_string'] = sf['timestamp'].apply(lambda x: datetime.datetime.fromtimestamp(x).strftime('%Y-%m-%d %H'))
you can also use the split_datetime API to split the timestamp to multiple columns:
split_datetime('timestamp',limit=['hour','minute'])