How to build Django Query Expression for Window function - django

I have postgres query, and I want to represent it using Django QuerySet builder
I have a table:
history_events
date -------------------------- amount
2019-03-16 16:03:11.49294+05 250.00
2019-03-18 14:56:30.224846+05 250.00
2019-03-18 15:07:30.579531+05 250.00
2019-03-18 20:52:53.581835+05 5.00
2019-03-18 22:33:21.598517+05 1000.00
2019-03-18 22:50:57.157465+05 1.00
2019-03-18 22:51:44.058534+05 2.00
2019-03-18 23:11:29.531447+05 255.00
2019-03-18 23:43:43.143171+05 250.00
2019-03-18 23:44:47.445534+05 500.00
2019-03-18 23:59:23.007685+05 250.00
2019-03-19 00:01:05.103574+05 255.00
2019-03-19 00:01:05.107682+05 250.00
2019-03-19 00:01:05.11454+05 500.00
2019-03-19 00:03:48.182851+05 255.00
and I need to build graphic using this data with step-by step incrementing amount sum by dates
This SQL collects correct data:
with data as (
select
date(date) as day,
sum(amount) as day_sum
from history_event
group by day
)
select
day,
day_sum,
sum(day_sum) over (order by day asc rows between unbounded preceding and current row)
from data
But I can not understand how to build correct Queryset expression for this
Another problem - there is no data for some days, and they do not appear on my graph

Nested queries like yours cannot be easily defined in ORM syntax. Subquery is limited to correlated subqueries returning a single value. This often results in contorted and inefficient ORM workarounds for queries that you can easily express in SQL.
In this case, you can use two Window functions combined with a distinct clause.
result = (Event.objects
.values('date', 'amount')
.annotate(day_sum=Window(
expression=Sum('amount'),
partition_by=[F('date')],
))
.annotate(total=Window(
expression=Sum('amount'),
frame=RowRange(start=None, end=0),
order_by=F('date').asc(),
))
.distinct('date')
.order_by('date', '-total')
)
You need to order by '-total' as otherwise distinct discards the wrong rows, leaving you with less than the correct amounts in total.
As to the missing days; SQL has no inherent concept of calendars (and therefore of missing dates) and unless you have lots of data, it should be easier to add missing days in a Python loop. In SQL, you would do it with a calendar table.

Related

Is it possible to track Min/Max In a DynamoDB table with a single query?

I am making a table to track the minimum and maximum something sold for in a day in a DyanomDB table. I will have a numeric min column and a numeric max column
My goal is
If the value I am passing in is between these number, ignore and don't write the table
If it is above the max, it is assigned to the max
If it is below the min, it is assigned the min
If the row does not exist, it is created and the number is assigned to both the min and max.
Is this possible to do in one Update command?
A one-shot update like you describe is not achievable. You might think to use conditional updates, but they
cannot manage the if-this-then-write-here-else-write-there jujitsu you require. Here are some DynamoDB patterns you can use*:
(A) 1 Update, 2 Separate Min/Max Queries
A single update writes individual scores to the table, which has a compound sort key. Min/max are not persisted, but rather returned at query time. Query PK = Product1ID and begins_with(SK, "20211218"). Limit=1. ScanIndexForward=False to return the daily max
product price (DESC order). True returns the daily minimum (ASC order, the default).
PK SK SalePrice Date
Product1ID 20211217#0400 4.00
Product1ID 20211218#0500 5.00
Product1ID 20211218#0600 6.00
Product2ID 20211218#2500 25.00
Product2ID 20211218#2600 26.00
(B) 2 Updates, 1 Query
The table has a single record per item per day with the min/max. Use two conditional updates, one to write the daily max, one for the daily min. Querying is an easy PK = Product1ID and SK = "20211218".
PK SK Min Max Date
Product1ID 20211217 4.00 5.50
Product1ID 20211218 5.00 6.00
Product2ID 20211218 25.00 26.00
(C) 1 Query + 1 Update to Write, 1 Query to Read
A variant on B's 2+1 solution. Same table design and query but different update logic. At update time, first query the the current product-day record. The updating function decides what, if any new min/max needs to be written, in a one-shot update.
(D) Kitchen Sink
Table has both A's individual and B's min/max records. Update as in A. Use DynamoDB streams to kick off a lambda on
a each new update. The lambda calculates the max/min and writes back the record to the table. Query is simple.
PK SK SalePrice Min Max Date
Product1ID 20211217 4.00 5.50
Product1ID 20211217#0400 4.00
Product1ID 20211218 5.00 6.00
Product1ID 20211218#0500 5.00
Product1ID 20211218#0600 6.00
Product2ID 20211218 25.00 26.00
Product2ID 20211218#2500 25.00
Product2ID 20211218#2600 26.00
* The usual health warnings apply: what patterns are better/worse depends on the use case's query patterns and data volumes. The number of queries/update operations may or may not
be a good measure of efficient design. Ask your doctor or pharmacist.

Power BI translating a sql query to filters

I was wondering if this is possible in Power BI, I am extremely new to this and I am trying to relate how a sql query can translate in to a power bi report.
SELECT
expiresDate,
Name,
Addr,
ValidFrom,
ValidTo,
ChildName,
ChildValidFrom,
ChildValidTo,
RecValidFrom,
RecValidTo
FROM Table
WHERE expiresDate Between <date1> and <date2>
AND <Date3> BETWEEN ValidFrom AND ValidTo
AND <Date3> BETWEEN ValidFrom AND ValidTo
AND <Date3> BETWEEN ValidFrom AND ValidTo
A brief explanation. The report is for 3 months in advance. So in August the report is for September <date1 = 01/09/2021) and October (date2 = 31/10/2021) data. However the data can change on a daily basis. So this depends on Date3 which could be any day in August.
I have created a table that is a calendar and has the additional columns that calculate the start and end dates from a particular date. I just can't work out how to relate this to the dataset which is the query without the WHERE. I would then want the filters to be able to determine the result. Ultimately as I have it at present a single date that will then get the dates from the start and end dates as described earlier. Or display by range using the latest iteration of the record to display.
For example, First part of table
expiresDate
AccNo
Name
Addr
ValidFrom
ValidTo
ChildName
2021-10-01
1
Robert
1 Here
2019-01-01
2021-08-16
Cheese
2021-10-01
1
Robert
1 Here
2019-01-01
2021-08-16
Rhubarb
2021-10-01
1
Bob
1 Here
2021-08-17
2020-08-23
Rhubarb
Second half of table
ChildValidFrom
ChildValidTo
RecValidFrom
RecValidTo
2019-01-01
2021-08-10
2019-19-01
2020-12-31
2021-08-11
2021-08-23
2021-01-01
2021-08-15
2021-08-11
2021-08-23
2021-08-16
2020-08-23
The table is a view which has squashed the data to unique records and when the changes occurred. The dataset is considerably lower, a record count from 10m to 54k.
The requirement is that all To - From dates are within the date specified. Either being a date in the calendar that is entered as a filter... or today.
The report would bring out all records that have an expiryDate greater than 1 calendar month of the date, and less than 3 calendar months. I am just using August dates for the example so this would be from the 01/09/2021 - 31/10/2021.
If I use date 2021-08-01.
In my example there are 3 results for AccNo 1, but Only 1 should be displayed.
If I use the date 2021-08-01 the first row would be displayed.
If I use the date 2021-08-12 the second row should displayed.
If I use the date 2021-08-23 the third row should displayed.
Because the date used should fall between the date range of all 3 criteria
ValidFrom - ChildValidTo
ChildValidFrom - ChildValidTo
RecValidFrom - RecValidTo
Any help would be greatly appreciated. This is extremely frustrating, but I can understand that if this is possibly that this would make a nice visual for the users to check through their data based on entering a date.
Many thanks

Query to calculate cost by month using AWS Athena querying

I have a table like below.
item_id bill_start_date bill_end_date usage_amount
635212 2019-02-01 00:00:00.000 3/1/2019 00:00:00.000 13.345 user_project
IBM
I am trying to find usage_amount by each month and each project. Amazon Athena query engine is based on Presto 0.172. Due to the limitations in Athena, it's not recognizing query like select sysdate from dual;.
I tried to convert bill_start_date and bill_end_date from timestamp to date but failed. even current_date() didn't work in my case. I am able to do calculate the total cost by hard coding the values but my end goal is to perform the action on columns.
SELECT (FLOOR(SUM(usage_amount)*100)/100) AS total,
user_project
FROM test_table
WHERE bill_start_date
BETWEEN date '2019-02-01'
AND date '2019-03-01'
GROUP BY user_project;
In Presto, current_timestamp is a SQL standard function which does not use parentheses.
To group by month, I'd use date_trunc('month', bill_start_date).
All of these functions are documented here

Avoid manual column creation in Teradata SQL

Consider the following Teradata View named 'VIEW' which consists of Transactional data.
ATTR1 ATTR2 DATE1 DATE2 WEEK1 WEEK2 AMOUNT
A B 1/1/2019 1/8/2019 201901 201902 10
A B 12/26/2018 1/8/2019 201852 201902 20
A B 1/1/2019 1/15/2019 201901 201903 30
A B 1/8/2019 1/15/2019 201902 201903 30
DATE1 is a Posting Date and DATE2 is the clearing date of the transaction. WEEK1 and WEEK2 are the fiscal weeks of DATE1 and DATE2 respectively. ATTR are random attributes of the transaction. I need to report the transaction amounts by the 'week of' for the attributes.
For example for Week 201901 we would like to see the transactions amounts of posting dates of and before Week 201901 and the clearing dates after 201901. See code below.
select ATTR1,
ATTR2,
SUM(CASE WHEN WEEK2 > 201852 AND WEEK1 <= 201852 THEN AMOUNT END) AS AMT_201852,
SUM(CASE WHEN WEEK2 > 201901 AND WEEK1 <= 201901 THEN AMOUNT END) AS AMT_201901,
SUM(CASE WHEN WEEK2 > 201902 AND WEEK1 <= 201902 THEN AMOUNT END) AS AMT_201902,
FROM VIEW
GROUP BY 1,2
The result:
ATTR1 ATTR2 AMT_201852 AMT_201901 AMT_201902
A B 20 60 60
As the code above suggests, we are having to manually create columns for each week which we would like to avoid. Is there a way to dynamically create these columns as the weeks pass by? Or is there a better way to represent this?
In a report, using WEEK1 as a filter will filter out the earlier weeks (In case of WEEK1 as 201901, 201852 will be filtered out and lose the respective amounts). We eventually put this SQL into a PowerBI dashboard.
Thanks!
You'll have to mess around with this a bit, but it should get you going in the right direction.
Teradata has an extremely useful date view built in: sys_calendar.calendar
This, for example, will get you the past three weeks in the format you have them in your sample data (YYYYWW):
select
distinct week_of_year,
year_of_calendar,
(year_of_calendar * 100) + week_of_year as year_week
from
sys_calendar.calendar
where
calendar_date between (current_date - interval '21' day) and current_date
order by year_week
You should be able to replace current_date with a parameter to make this dynamic, and join this to your table.

How to get latest record for matching ID and date in Power BI Query Editor

I have two tables:
Table A, which indicates where my truck is for each day,
Date Truck Region
5/20/2018 1014 NY
5/21/2018 1014 NJ
and Table B (which contains when my truck inspection was done for each day). Sometimes there may be more than one inspection record, but I only need the last one by truck by day. As you can see, I added a rank column. Truck 1014 has two records for 5/20/2018, but the last one gets ranked as 1 (I will filter the table by 1).
Date Time Truck Rank
5/20/2018 5/20/18 9:00 AM 1014 2
5/20/2018 5/20/18 2:00 PM 1014 1
5/21/2018 5/21/18 2:50 PM 1014 1
I want to merge those two tables together. The reason I ask how to do it in the query editor is that, in the relationship view, you cannot create a relationship on two columns. For example, in my example, I want to join data by date and by truck number, which I cannot. What is the right direction for this situation?
In the query editor, you can use the Merge Queries button under the Home tab.
(You'll need to hold down the Ctrl key to select multiple columns.)
Once you've merged, just expand the columns from Table B that you want to bring over (e.g. Time and Rank). If you did not filter Rank = 1 before merging, you can include it when expanding and filter afterward.
Note that you can also use the LOOKUPVALUE DAX function outside of the query editor. As a new column on Table A:
Time = LOOKUPVALUE('Table B'[Time],
'Table B'[Date], [Date],
'Table B'[Truck], [Truck],
'Table B'[Rank], 1)