AWS CloudWatch: visualise more time series - amazon-web-services

I would like to visualise event occurrence changes in time.
Use case:
Let's say my logs contains 2 types of events (eventA, eventB).
I'm interested in a line graph that shows the number of events per hours. (line#1: dataA1, dataA2... ; line#2: dataB1, dataB2...)
What I'm aware of:
Query the logs: fields #timestamp, eventName | stats count() by bin(1h), eventName | sort bin(1h) asc
The above query gives all the data for creating the desired graph (eg: [bin(1h)], [count()], [eventName])
If I remove the eventName field form display I get a log-table with the correct data, but the line graph is showing datapoints mixed (eg: dataA1, dataA2, dataB3, dataA4, dataB5)
The question:
Is it possible to generate a line graph with more series in it?
If yes, what parametrization do I need?

See https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_Insights-Visualizing-Log-Data.html
Visualizing time series data
Time series visualizations work for queries with the following characteristics:
The query contains one or more aggregation functions. For more information, see Aggregation Functions in the Stats Command.
The query uses the bin() function to group the data by one field.
These queries can produce line charts, stacked area charts, bar charts, and pie charts.
You can't use line chart for your example because you can only use single bin() grouping to produce time series. You can however use e.g. pie chart for your use case.
Alternatively if applicable to your use case, you can start producing logs in different format as
{
"eventA": 1,
"eventB": 0
}
Then you can write query as
stats sum(eventA), sum(eventB) by bin(1h)

Related

Importrange + Query + Matches + Regexp

I am trying to filter out data from a different sheet with a specific account number. However the code below doesn't give out any results
=query(importrange(Setup!B1,"Sheet1!A2:F"),"Select * where Col4 matches '\d\d-\d\d\d\d-[14-7]\d\d\d'",1)
This is supposed to filter out all accounts where the 1st digit in the 3rd group of numbers is either 1,4,5,6 or 7. The width of the account numbers are all the same following the format xx-xxxx-xxxx.
Try using this as your match criteria:
\d{2}-\d{4}-[14-7]{3}\d
Also, while I can't see your data, make sure you actually have a header in the first row of your IMPORTRANGE results (which you've requested with the 1 at the end of the QUERY). If you don't actually have headers, the 1 will leave you with one more result than you want; if that is the case, just remove the ,1 from the end of the QUERY.
If this doesn't produce the results you want, it may be due to mixed data types in your raw data that are being filtered out by the QUERY. In that case, you can try using FILTER and REGEXMATCH instead:
=ArrayFormula(FILTER(IMPORTRANGE(Setup!B1,"Sheet1!A2:F"),REGEXMATCH(IMPORTRANGE(Setup!B1,"Sheet1!D2:D"),"\d{2}-\d{4}-[14-7]{3}\d")))
It is always hard to write complex formulas sight unseen. If none of these solutions (which work in my local sheet) produce the results you expect, I encourage you to share a link to both of your sheets. The raw data sheet being called by IMPORTRANGE can be "View Only"; but you'll want to set the Share permission on the second sheet (the one with the IMPORTRANGE formula itself) to "Anyone with the link..." and "Editor," so that those here can access it to test.

Visualize time values over days in QuickSight

I have an event dataset in QuickSight, where each record has a timestamp field as following:
last_day_record_ts |
-------------------|
2020-01-19 05:46:55|
2020-01-20 05:55:37|
2020-01-21 06:00:12|
2020-01-22 06:12:57|
2020-01-23 06:02:15|
2020-01-24 06:15:35|
2020-01-25 06:20:05|
2020-01-26 05:55:48|
I want to build a visualization of time values over days as a line chart as following:
However, I find it difficult to get this in AWS QuickSight. Any ideas?
Instead of desired result QuickSight persistently gives just aggregated record values (i.e 1 for each day) but not the time values itself...
UPDATE. The workaround I found for now - to add calculated fields to the Data Set in order to get numeric values instead of timestamp ones.
Calculated fields:
day_midnight | truncDate('DD',{last_day_record_ts})
time_diff_in_hours_dec | abs(dateDiff({last_day_record_ts},{day_midnight},"MI")) / 60
time_diff_in_hours_int | decimalToInt({time_diff_in_hours_dec})
time_diff_in_min | ({time_diff_in_hours_dec} - {time_diff_in_hours_int}) * 60
The only problem I still cannot solve - to get Y axis labels in HH:MM format as in green rectangle. For now, it's numeric decimals...
Unfortunately, (after many attempts of my own) this type of visual does not appear to be possible in Quicksight at the time of writing.
Quicksight has many nice features, but it's still missing some (very basic imo) things that make it limiting for anyone working with data that is outside the expected use-cases.

Stackdriver Logs-Based Metrics - need sum over alignment period

We have some stackdriver log entries that look something like this:
{
insertId: "xyz"
jsonPayload: {
countOfApples: 100
// other stuff
}
// other stuff
}
We would like to be able to set up a log-based metric that tells us the total number of apples seen in the past 10 mins (or any alignment period) but I have, thus far, been unable to find a means of doing so despite reading through the documentation.
Attempt 1:
Filter for those log-entries where countOfApples is specified and create a Counter metric with countOfApples as a label.
having done this, I can filter based on the countOfApples being above or below a certain value. I cannot see a means of aggregating based on this value. All the aggregation options seem to apply to the number of log entries matching the filter over the alignment period
Attempt 2:
Filter for those log-entries where countOfApples is specified and create a distribution metric with the Field Name set to jsonPayload.CountOfApples
This seems to get closer because I can now see the apple count in the metrics explorer but I cannot find the correct combination of Aligner/Reducers to just give me the total number of apples over the period? Selecting Aligner:delta & Reducer:sum results in an error message:
This aggregation does not produce a valid data type for a Line plot
type. Click here to switch the aligner to sum and the reducer to 99th
percentile
Is it possible to just monitor the total sum of all these values over each alignment period?
As of 2019/05/03, it is not possible to create a counter metric based on the values stored in the logs. Putting the values into a label simply exposes them as strings, which lets you filter but not perform aggregations based on those values. According to the documentation, a counter metric counts log entries, not the values in those log entries. As you've noticed, there aren't enough operations available on distribution metrics to do what you want.
For now, your best bet is to write your own custom metric based on those log values. You can do this by exporting your logs to Cloud Pub/Sub and writing some code to process the logs from Pub/Sub and send custom metrics. Alternatively, you could try to configure the Stackdriver monitoring agent to extract the values using the tail plugin, and send them as custom metrics.
If you just need to graph and explore the values (rather than, e.g., use them for alerting), you could try Cloud Datalab.
If anyone still looking how to solve it, it seems that now it's possible to do sum aggregation on distribution metric using sum_from function. Example:
fetch k8s_container
| metric 'logging.googleapis.com/user/tracking-data-len'
| group_by [], sum(sum_from(value))

Joining inputs for a complicated output

Im new in azure analytics. Im using analytics to get feedbacks from users. There are about 50 events that im sending to azure in a second and im trying to get a combined result from two inputs but couldnt get a working output. My problem is in sql query for output.
Now I'm sending in the inputs.
Recommandations:
{"appId":"1","sequentialId":"28","ItemId":"1589018","similaristyValue":"0.104257207028537","orderId":"0"}
ShownLog:
{"appId":"1","sequentialId":"28","ItemId":"1589018"}
I need to join them with sequentialId and ItemId and calculate the difference between two ordered sequential.
For example: I send 10 Recommandations events and after that (like after 2 sec) i send 3 ShownLog event. So what i need to do is i have to get sum of first 3 (because i send 3 shownlog event) event's similaristyValue ordered by "orderid" from "Recommandations". I also need to get the sum of similarityValues from "ShownLog". At the end i need an input like (for every sequential ID):
sequentialID Difference
168 1.21
What i ve done so far is. I save all the inputs my azure sql and i ve managed to write the sql i want. You may find the mssql query for it:
declare #sumofSimValue float;
declare #totalItemCount int;
declare #seqId float;
select
#sumofSimValue = sum(b.[similarityValue]),
#totalItemCount = count(*),
#seqId = a.sequentialId
from EventHubShownLog a inner join EventHubResult b on a.sequentialId=b.sequentialId and a.ItemId=b.ItemId group by a.sequentialId
--select #sumofSimValue,#totalItemCount,#seqId
SELECT #seqId, SUM([similarityValue])-#sumofSimValue
FROM (
SELECT TOP(#totalItemCount) [similarityValue]
FROM [EventHubResult] where sequentialId=#seqId order by orderId
) AS T
But it gives lots of error in analytics. Also it lacks the logic of azure analytcs. I hope i could tell the problem.
Can you tell me how can i do such a job for my system? How can i use the time windows or how can i join them properly?
For every shown log, you have to select sum of similarity value. Is that the intention? Why not just join and select sum? It would only select as many rows as there are shown logs.
One thing to decide is the maximum time difference between recommendation events and shown log events, with that you can use Azure Stream analytics join, https://msdn.microsoft.com/en-us/library/azure/dn835026.aspx

How to create Google Chart with lines (series) of different lengths?

How do I create a Google Line Chart that displays two or more lines, with a different number of data points in each series?
For instance, I want to create a chart with 2 lines, one showing the expected values over time and another showing the actual values over time. The date range and expected values are known in advance so I can fully graph them, but the actual values may not be fully known yet (e.g. the date range covers some dates in the future).
I found the answer in this SO question. The solution is to use "_" (or "__", depending on the encoding of the data values) to indicate "no value".
For instance, one data series might be 10,7,3,1 and the other might be 10,6,_,_.