DynamoDB with daily/weekly/monthly aggregated values - amazon-web-services

My application is creating a log file every 10min, which I want to store in DynamoDB in an aggregated way, e.g. 144 log files per day, 1008 log files per week or ~4400 log files per month.
I have different partition keys, but for sake of simplicity I have used only a single partition key in the following examples.
The straight forward solution would be to have different tables, e.g.
Table "TenMinLogsDay":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-04-30 | 144 | some serialized aggregated data
1 | 2017-05-01 | 144 | some serialized aggregated data
1 | 2017-05-02 | 144 | some serialized aggregated data
1 | 2017-05-03 | 144 | some serialized aggregated data
Table "TenMinLogsWeek":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-05-01 | 1008 | some serialized aggregated data
1 | 2017-05-08 | 1008 | some serialized aggregated data
1 | 2017-05-15 | 1008 | some serialized aggregated data
Table "TenMinLogsMonth":
id (=part.key) | date (=sort key) | cntTenMinLogs | data
-------------- | ---------------- | ------------- | -------------------------------
1 | 2017-05-01 | 4464 | some serialized aggregated data
1 | 2017-06-01 | 4320 | some serialized aggregated data
1 | 2017-07-01 | 4464 | some serialized aggregated data
I would prefer however a combined table. Out of the box DynamoDB does not seem to support this.
Also, I want to query either the daily OR the weekly OR the monthly aggregated items, thus I don't want to use the filter feature for this.
The following solution would be possible, but seems like a poor hack:
Table "TenMinLogsCombined":
id (=part.key) | date (=sort key) | week (=LSI sort key) | month (=LSI sort key) | cntTenMinLogs | data
-------------- | ---------------- | -------------------- | --------------------- | ------------- | -----
1 | 2017-04-30 | (empty) | (empty) | 144 | ...
1 | 2017-05-01 | (empty) | (empty) | 144 | ...
1 | 0017-05-01 | 2017-05-01 | (empty) | 1008 | ...
1 | 1017-05-01 | (empty) | 2017-05-01 | 4464 | ...
1 | 2017-05-02 | (empty) | (empty) | 144 | ...
1 | 2017-05-03 | (empty) | (empty) | 144 | ...
Explanation:
By using the year "0017" and "1017" instead of "2017" I can query the date range for, e.g. 2017-05-01 to 2017-05-04 and DynamoDB won't read the items starting with 0017 or 1017
For week or month range queries, such a hack is not required, as empty LSI sort keys are possible.
Does anybody know of a better way to achieve this?

Related

how to make a query in sql to use in c++ code on counting the borrowing report in every month

I want to make a query to count borrowing report every month . But i'd saved my data in unixtime.
tablename:borrow
attributes:borrowingID,dateOfBorrow,dateOfReturn,statusBook
For example the dateOfBorrow is 167077440 and i just want to count the specific month for jan,feb,etc..
i am expecting
| Month | Total |
| ------| ----- |
| Jan | 2 |
| Feb | 5 |
| Mar | 5 |
...etc
select from_unixtime(167077440),from_unixtime(167077440,'%b')
+--------------------------+-------------------------------+
| from_unixtime(167077440) | from_unixtime(167077440,'%b') |
+--------------------------+-------------------------------+
| 1975-04-18 19:24:00 | Apr |
+--------------------------+-------------------------------+
1 row in set (0.001 sec)
See manual https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_from-unixtime and https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format
But are you really interested in 1975?

How do I repeat row labels in a matrix?

I have data showing me the dates grouped like this:
For security reasons, I had to remove the Customer Description detail, due to confidentiality.
How do I repeat the date column the same way you repeat the Row Labels in an Excel Pivot?
I've looked, but couldn't find a solution to this - this option should be available.
EDIT
When you have the following source data in Excel:
Date | Customer | Item Description | Qty Out | Unit Price | Sales
--------------------------------------------------------------------------------------------------------------------------------------------
14/08/2020 | Customer 1 | Item 11 | 4.00 | 65.00 | 260.00
14/08/2020 | Customer 2 | Item 12 | 56.00 | 12.00 | 672.00
14/08/2020 | Customer 3 | Item 13 | 64.00 | 35.00 | 2,240.00
14/08/2020 | Customer 4 | Item 14 | 29.00 | 65.00 | 1,885.00
15/08/2020 | Customer 2 | Item 15 | 746.00 | 12.00 | 8,952.00
15/08/2020 | Customer 3 | Item 16 | 14.00 | 75.00 | 1,050.00
15/08/2020 | Customer 4 | Item 17 | 45.00 | 741.00 | 33,345.00
15/08/2020 | Customer 5 | Item 18 | 456.00 | 125.00 | 57,000.00
15/08/2020 | Customer 6 | Item 19 | 925.00 | 17.00 | 15,725.00
16/08/2020 | Customer 4 | Item 20 | 6.00 | 532.00 | 3,192.00
16/08/2020 | Customer 5 | Item 21 | 56.00 | 94.00 | 5,264.00
16/08/2020 | Customer 6 | Item 22 | 546.00 | 37.00 | 20,202.00
You then pivot this data using Microsoft Excel, where you get the following:
You then choose the option to Repeat Item Labels as can be seen below:
After selecting this, you get my expected results I require in Power BI:
Is there not a function available like this in Power BI?
Just adding this for your reference as a work around. Check this below image with a custom column created in the Power Query Editor-
date_customer = Date.ToText([Date]) &" : "& [Customer]
Then added both Date and date_customer in the Matrix row level. The output is as below- (using your sample data)
ANOTHER OPTION Another option is to add Date and Customer in the Matrix row and the output is will be as below- (using your sample data)
This is also a meaningful output as date are showing as a group header. But in case of requirement of having redundant date to show, you can consider the first option.

Filter out outliers dynamically using PERCENTILE

I'm building a sales dashboard in PowerBI.
I have a Sales table.
My source of data is declarative, so I have a few extreme values caused by human errors and mistypes, etc.
Let's say I want to build a histogram with:
On the X axis, the stock aging of any sales. Which is "how long the product has been in stock at the time of sale". It is given by the [Product_Age] column
On values, the number of sales.
What I want to do is exclude the top 1% extreme values from my calculations (average, etc.) and vizualisations.
I've created a measure :
SalesByAge_Adjusted =
VAR TEMP =
FILTER(
SALES;
VAR StockAgingMAX =
PERCENTILE.INC(
SALES[Sales_Age];
0,99
)
RETURN
SALES[Sales_Age] < StockAgingMAX
)
RETURN
COUNTROWS(TEMP)
It uses PERCENTILE.INC to get the 99th percentile of Sales_Age values in the current context and I try to use it as a filter.
However, it just won't work.
I can diplay the measure on its own. How many sales I have. But as soon as I drag and drop "Sales_Age" to summarize the values. It shows nothing.
I have created the following table as an example.
+-------+--------+
| Axis | Values |
+-------+--------+
| 1 | 1067 |
| 2 | 1725 |
| 4 | 298 |
| 8 | 402 |
| 16 | 1848 |
| 32 | 1395 |
| 64 | 1116 |
| 128 | 1027 |
| 256 | 1948 |
| 512 | 790 |
| 1024 | 2173 |
| 2048 | 2025 |
| 4096 | 104 |
| 8192 | 1243 |
| 16384 | 1676 |
| 32768 | 1285 |
| 65536 | 806 |
+-------+--------+
For filtering the values that are out the 99% percentile I've created the following measure. Basically it gets an overall percentile without filter context and compares to each Axis value.
Filter = IF(CALCULATE(PERCENTILE.INC('Table'[Axis],0.99),ALL('Table'))>=MAX('Table'[Axis]),1,0)
In the visual of the chart, you use the filter measure to exclude your outliers
In this case, it will filter the last value of table: 65,536

PowerBI: Use non-shown values for Drillthrough

I am trying to build a Power BI report for data from a SQL database where I have to show detail pages using Drillthrough. The only viable way to connect the datasets is using the database row ids.
From a user's perspective the row ids would not add any value but a lot of noise.
Is there a way to drillthrough using the row ids without showing them in a visual?
Yes, this is possible in the current release of Power Bi Desktop using a workaround solution that involves hiding the row id column in the parent (or summary) page.
Take the following tables as example:
ALBUM
+---------+------------------------+
| AlbumId | AlbumName |
+---------+------------------------+
| 1 | Hoist |
+---------+------------------------+
| 2 | The Story Of the Ghost |
+---------+------------------------+
TRACK
+---------+---------+--------------------------+
| TrackId | AlbumId | TrackName |
+---------+---------+--------------------------+
| 1 | 1 | Julius |
+---------+---------+--------------------------+
| 2 | 1 | Down With Disease |
+---------+---------+--------------------------+
| 3 | 1 | If I Could |
+---------+---------+--------------------------+
| 4 | 1 | Riker's Mailbox |
+---------+---------+--------------------------+
| 5 | 1 | Axilla, Part II |
+---------+---------+--------------------------+
| 6 | 1 | Lifeboy |
+---------+---------+--------------------------+
| 7 | 1 | Sample In a Jar |
+---------+---------+--------------------------+
| 8 | 1 | Wolfmans Brother |
+---------+---------+--------------------------+
| 9 | 1 | Scent of a Mule |
+---------+---------+--------------------------+
| 10 | 1 | Dog Faced Boy |
+---------+---------+--------------------------+
| 11 | 1 | Demand |
+---------+---------+--------------------------+
| 12 | 2 | Ghost |
+---------+---------+--------------------------+
| 13 | 2 | Birds of a Feather |
+---------+---------+--------------------------+
| 14 | 2 | Meat |
+---------+---------+--------------------------+
| 15 | 2 | Guyute |
+---------+---------+--------------------------+
| 16 | 2 | Fikus |
+---------+---------+--------------------------+
| 17 | 2 | Shafty |
+---------+---------+--------------------------+
| 18 | 2 | Limb by Limb |
+---------+---------+--------------------------+
| 19 | 2 | Frankie Says |
+---------+---------+--------------------------+
| 20 | 2 | Brian and Robert |
+---------+---------+--------------------------+
| 21 | 2 | Water in the Sky |
+---------+---------+--------------------------+
| 22 | 2 | Roggae |
+---------+---------+--------------------------+
| 23 | 2 | Wading in the Velvet Sea |
+---------+---------+--------------------------+
| 24 | 2 | The Moma Dance |
+---------+---------+--------------------------+
| 25 | 2 | End of Session |
+---------+---------+--------------------------+
Add them as data sources. The 1:many relationship between AlbumId should be created. Create a parent page with a table containing AlbumId and AlbumName. Then create the details page with a table containing only the TrackName column. In the Drillthrough filter field of the details page, drag the Album Table -> AlbumId to this field.
Now go back to the parent page and notice that when you right click on an album, you get the drillthrough menu to the details page. This works, but now you have a messy AlbumId column on your parent page.
The workaround is to hide the AlbumId on the parent report. First go to the Format(Paint roller) menu of the table on the parent report and in the column header -> word wrap turn this off. Then drag the column separator of the table to hide the AlbumId. See before and after images below.
BEFORE HIDE
AFTER HIDE
I have the powerbi file posted here if you want to see it in action.

How to run raw query with a model with dynamic fields in Django 1.9?

I have a complex result that requires writing raw sql queries.
See https://stackoverflow.com/a/38548462/80353
The expected result is a table showing several columns.
The first column header is simply Product and the other column headers are store names.
The values are simply the product names and the aggregated sales values of the product in these stores.
Which stores will be shown is entirely dynamic. Maximum should be 9 stores.
The same in text format:
Store table
------------------------------
| id | code | address |
|-----|------|---------------|
| 1 | S1 | Kings Row |
| 2 | S2 | Queens Street |
| 3 | S3 | Jacks Place |
| 4 | S4 | Diamonds Alley|
| 5 | S5 | Hearts Road |
------------------------------
Product table
------------------------------
| id | code | name |
|-----|------|---------------|
| 1 | P1 | Saucer 12 |
| 2 | P2 | Plate 15 |
| 3 | P3 | Saucer 13 |
| 4 | P4 | Saucer 14 |
| 5 | P5 | Plate 16 |
| and many more .... |
|1000 |P1000 | Bowl 25 |
|----------------------------|
Sales table
----------------------------------------
| id | product_id | store_id | amount |
|-----|------------|----------|--------|
| 1 | 1 | 1 |7.05 |
| 2 | 1 | 2 |9.00 |
| 3 | 2 | 3 |1.00 |
| 4 | 2 | 3 |1.00 |
| 5 | 2 | 5 |1.00 |
| and many more .... |
| 1000| 20 | 4 |1.00 |
|--------------------------------------|
The relationships are:
Sales belongs to Store
Sales belongs to Product
Store has many Sales
Product has many Sales
What I want to achieve
I want to display by pagination in the following manner:
Given the stores S1-S3:
-------------------------
| product | S1 | S2 | S3 |
|---------|----|----|----|
|Saucer 12|7.05|9 | 0 |
|Plate 15 |0 |0 | 2 |
| and many more .... |
|------------------------|
For more details of the schema, check the question in How to get back aggregate values across 2 dimensions using Python Cubes?
My question
The schema is not super important to my question which is:
Since I am going to write a complex raw query, is there a way to map the query result to a model where the fields are dynamic?
I found documentation about how to execute raw queries in Django and how to execute raw queries to existing models with fixed fields and matching table.
My question is is it possible to do that for a model that has no matching table and dynamic fields?
If so, how?
Or if I choose to use materialised view in postgresql, how do I match it with a model class?