Visualize time values over days in QuickSight - amazon-web-services

I have an event dataset in QuickSight, where each record has a timestamp field as following:
last_day_record_ts |
-------------------|
2020-01-19 05:46:55|
2020-01-20 05:55:37|
2020-01-21 06:00:12|
2020-01-22 06:12:57|
2020-01-23 06:02:15|
2020-01-24 06:15:35|
2020-01-25 06:20:05|
2020-01-26 05:55:48|
I want to build a visualization of time values over days as a line chart as following:
However, I find it difficult to get this in AWS QuickSight. Any ideas?
Instead of desired result QuickSight persistently gives just aggregated record values (i.e 1 for each day) but not the time values itself...
UPDATE. The workaround I found for now - to add calculated fields to the Data Set in order to get numeric values instead of timestamp ones.
Calculated fields:
day_midnight | truncDate('DD',{last_day_record_ts})
time_diff_in_hours_dec | abs(dateDiff({last_day_record_ts},{day_midnight},"MI")) / 60
time_diff_in_hours_int | decimalToInt({time_diff_in_hours_dec})
time_diff_in_min | ({time_diff_in_hours_dec} - {time_diff_in_hours_int}) * 60
The only problem I still cannot solve - to get Y axis labels in HH:MM format as in green rectangle. For now, it's numeric decimals...

Unfortunately, (after many attempts of my own) this type of visual does not appear to be possible in Quicksight at the time of writing.
Quicksight has many nice features, but it's still missing some (very basic imo) things that make it limiting for anyone working with data that is outside the expected use-cases.

Related

Is there a way to just sum a google cloud metric within a time period?

With the Metrics Explorer in the Google Cloud Platform, is it possible to sum a metric within a defined time period? I have custom gauge metrics set up for various data that I care for. And I am just trying to sum them up over the course of days, weeks, and months. It is important that I count everything between the first of the month and the last of the month. But instead, the aligner seems to pick an arbitrary time to align to (e.g. the 4th of the month) and I can't be certain that I am getting the correct values.
For example, if I try to use delta, rate, or sum within a time window of May 1st 00:00 to June 1st 00:00, with an alignment of 31 days, I will see two numbers. One will be for the 14th of May and one for the 14th of June and they will add up to a very large number.
fetch generic_task
| metric 'custom.googleapis.com/go-metrics/request_counter'
| filter (metric.environment == 'prod')
| align delta(31d)
| every 31d
| group_by [metric.environment],
[value_request_counter_aggregate: aggregate(value.request_counter)]
That isn't so bad but if I change alignment, the sum of those numbers don't add up, such as if I tried 7d or 1d instead. Like, the number is twice as much as if it were counting data from outside the time range that I specified. And worse, upon reload of the webpage, it picks different days/times it will align on too.
To get around this, I have been reduced to setting the alignment to a fine amount and just tolerating a small amount of error/inconsistency.

AWS CloudWatch: visualise more time series

I would like to visualise event occurrence changes in time.
Use case:
Let's say my logs contains 2 types of events (eventA, eventB).
I'm interested in a line graph that shows the number of events per hours. (line#1: dataA1, dataA2... ; line#2: dataB1, dataB2...)
What I'm aware of:
Query the logs: fields #timestamp, eventName | stats count() by bin(1h), eventName | sort bin(1h) asc
The above query gives all the data for creating the desired graph (eg: [bin(1h)], [count()], [eventName])
If I remove the eventName field form display I get a log-table with the correct data, but the line graph is showing datapoints mixed (eg: dataA1, dataA2, dataB3, dataA4, dataB5)
The question:
Is it possible to generate a line graph with more series in it?
If yes, what parametrization do I need?
See https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_Insights-Visualizing-Log-Data.html
Visualizing time series data
Time series visualizations work for queries with the following characteristics:
The query contains one or more aggregation functions. For more information, see Aggregation Functions in the Stats Command.
The query uses the bin() function to group the data by one field.
These queries can produce line charts, stacked area charts, bar charts, and pie charts.
You can't use line chart for your example because you can only use single bin() grouping to produce time series. You can however use e.g. pie chart for your use case.
Alternatively if applicable to your use case, you can start producing logs in different format as
{
"eventA": 1,
"eventB": 0
}
Then you can write query as
stats sum(eventA), sum(eventB) by bin(1h)

AWS Forecast cannot train the predictor due to missing data

This question is close, but doesn't quite help me with a similar issue as I am using a single data set and no related time series.
I am using AWS Forecast with a single time series dataset (no related data, just the main DS). It is a daily data set with about 10 years of data ranging from 2010-2020.
I have 3572 data points in the original data set; I manually filled missing data to ensure there were no missing days in the date range for a total of 3739 data points. I lopped off everything in 2020 to create a validation dataset and then configured the predictor for a 180 day Forecast. I keep getting the following error:
Unable to evaluate this dataset because there is missing data in the evaluation window for all items. Ensure that there is complete data for at least one item in the evaluation window starting from 2019-03-07T00:00:00 up to 2020-01-01T00:00.
There is definitely no missing data, I've double and triple checked the date range and data fill and every day between start and end dates has a data point. I also tried adding a data point for 1/1/2020 (it ended at 12/31/2019) and I continue to get this error. I can't figure out what it's asking me for, except that maybe I'm missing something in my math about the forecast Horizon and Backtest window offset?
Dataset example:
Brief model parameters (can share more if I'm missing something pertinent):
Total data points in training data: 3479
forecastHorizon = 180
create_predictor_response=forecast.create_predictor(PredictorName=predictorName,
ForecastHorizon=forecastHorizon,
PerformAutoML= True,
PerformHPO=False,
EvaluationParameters= {"NumberOfBacktestWindows": 1,
"BackTestWindowOffset": 180},
InputDataConfig= {"DatasetGroupArn": datasetGroupArn},
FeaturizationConfig= {"ForecastFrequency": 'D'
I noticed you don't have entry for 6/24/10 (this american date format is the worst btw)
I faced a similar problem when leaving out days (assuming you're modelling in daily frequency) just like that and having the Forecast automatic filling of gaps to nan values (as opposed to zero which is the default). I suggest you:
pre-fill literally every date within the range of training data (and of forecast window, if using related data)
choose zero as the option for automatically filling of missing values. I think mean or any other float value would also work for that matter
let me know if that works! I am also using Forecast and it's good to keep track of possible problems and solutions

Why Amazon Forecast cannot train the predictor?

While training my predictor I came across this error and I got stuck how to fix it.
I have two data-series, a "Target time-series data" with 9234 rows and a single "item_id" and a second one that is "Related time-series data" with the same number of rows as I only have a single id.
I'm setting de data with a window of 180 days, what is exactly the difference between the second and the first number that has appeared on the error, 9414 - 9234 = 180.
We were unable to train your predictor.
Please ensure there are no missing values for any items in the related time series, All items need data until 2020-03-15 00:00:00.0. For example, following items have missing data: item: brl only has 9234/9414 required datapoints starting 1994-06-07 00:00:00.0, please refer to documentation for additional details.
Once my data don't have missing data and it's on a daily basis why is it returning this error?
My data starts on 1994-06-07 and ends on 2019-09-17. Why should I have 9414 data points rather than 9234?
Should I take out 180 days in my "Target time-series data"?
The future values of the related time-series data must be known.
Example of a good related-time series: You know past and future days in which marketing has or will send email newsletters promoting the product you're forecasting. You can use this data as a related-time series.
Example of a bad related-time series: You notice that Google searches for your brand correlated with the sale of your product. As a result you want to use it as a related-time series. Since you don't know how many searches will occur in the future, so you can't use this as a related time series.
In you case, You have TARGET_TIME_SERIES data for 9414 days and you want to predict demand for the next 180 days. That means your RELATED_TIME_SERIES data should be 9594 days.
Edit: I have not tested this with amazon's forecasting product. I'm basing my answer on working with Facebook Prophet (which is one of the models amazon forcast uses). Please let me know if my solution worked.

AWS IoT Analytics Delta Window

I am having real problems getting the AWS IoT Analytics Delta Window (docs) to work.
I am trying to set it up so that every day a query is run to get the last 1 hour of data only. According to the docs the schedule feature can be used to run the query using a cron expression (in my case every hour) and the delta window should restrict my query to only include records that are in the specified time window (in my case the last hour).
The SQL query I am running is simply SELECT * FROM dev_iot_analytics_datastore and if I don't include any delta window I get the records as expected. Unfortunately when I include a delta expression I get nothing (ever). I left the data accumulating for about 10 days now so there are a couple of million records in the database. Given that I was unsure what the optimal format would be I have included the following temporal fields in the entries:
datetime : 2019-05-15T01:29:26.509
(A string formatted using ISO Local Date Time)
timestamp_sec : 1557883766
(A unix epoch expressed in seconds)
timestamp_milli : 1557883766509
(A unix epoch expressed in milliseconds)
There is also a value automatically added by AWS called __dt which is a uses the same format as my datetime except it seems to be accurate to within 1 day. i.e. All values entered within a given day have the same value (e.g. 2019-05-15 00:00:00.00)
I have tried a range of expressions (including the suggested AWS expression) from both standard SQL and Presto as I'm not sure which one is being used for this query. I know they use a subset of Presto for the analytics so it makes sense that they would use it for the delta but the docs simply say '... any valid SQL expression'.
Expressions I have tried so far with no luck:
from_unixtime(timestamp_sec)
from_unixtime(timestamp_milli)
cast(from_unixtime(unixtime_sec) as date)
cast(from_unixtime(unixtime_milli) as date)
date_format(from_unixtime(timestamp_sec), '%Y-%m-%dT%h:%i:%s')
date_format(from_unixtime(timestamp_milli), '%Y-%m-%dT%h:%i:%s')
from_iso8601_timestamp(datetime)
What are the offset and time expression parameters that you are using?
Since delta windows are effectively filters inserted into your SQL, you can troubleshoot them by manually inserting the filter expression into your data set's query.
Namely, applying a delta window filter with -3 minute (negative) offset and 'from_unixtime(my_timestamp)' time expression to a 'SELECT my_field FROM my_datastore' query translates to an equivalent query:
SELECT my_field FROM
(SELECT * FROM "my_datastore" WHERE
(__dt between date_trunc('day', iota_latest_succeeded_schedule_time() - interval '1' day)
and date_trunc('day', iota_current_schedule_time() + interval '1' day)) AND
iota_latest_succeeded_schedule_time() - interval '3' minute < from_unixtime(my_timestamp) AND
from_unixtime(my_timestamp) <= iota_current_schedule_time() - interval '3' minute)
Try using a similar query (with no delta time filter) with correct values for offset and time expression and see what you get, The (_dt between ...) is just an optimization for limiting the scanned partitions. You can remove it for the purposes of troubleshooting.
Please try the following:
Set query to SELECT * FROM dev_iot_analytics_datastore
Data selection filter:
Data selection window: Delta time
Offset: -1 Hours
Timestamp expression: from_unixtime(timestamp_sec)
Wait for dataset content to run for a bit, say 15 minutes or more.
Check contents
After several weeks of testing and trying all the suggestions in this post along with many more it appears that the extremely technical answer was to 'switch off and back on'. I deleted the whole analytics stack and rebuild everything with different names and it now seems to now be working!
Its important that even though I have flagged this as the correct answer due to the actual resolution. Both the answers provided by #Populus and #Roger are correct had my deployment being functioning as expected.
I found by chance that changing SELECT * FROM datastore to SELECT id1, id2, ... FROM datastore solved the problem.