Querying Historical Data to get Month End Data - amazon-web-services

We have a history table that keeps all instances of a record, and flags which is the current record and when it is changed - here is a cut down version for it
CREATE TABLE *schema*.hist_temp
(
record_id VARCHAR
,record_created_date DATE
,current_flag BOOLEAN
,value int
)
INSERT INTO hist_temp VALUES ('Record A','2018-06-01',1,1000);
INSERT INTO hist_temp VALUES ('Record A','2018-04-12',0,900);
INSERT INTO hist_temp VALUES ('Record A','2018-03-13',0,800);
INSERT INTO hist_temp VALUES ('Record A','2018-01-13',0,700);
So what we have is Record A, which has been updated 3 times, the latest record is flagged with a 1 but we want to see all 4 instances of the history.
Then we have a dates table which holds, among other things, month end dates:
SELECT
calendar_date
,trunc(month_start) as month_start
FROM common.calendar
WHERE
calendar_year = '2018'
and calendar_date < trunc(sysdate)
ORDER BY 1 desc
Sample data:
calendar_date month_start
2018-06-03 2018-06-01
2018-06-02 2018-06-01
2018-06-01 2018-06-01
2018-05-31 2018-05-01
2018-05-30 2018-05-01
2018-05-29 2018-05-01
2018-05-28 2018-05-01
2018-05-27 2018-05-01
2018-05-26 2018-05-01
2018-05-25 2018-05-01
etc
Required results:
I would like to be able to display the following - show the month start / end position for Record A for 2018
record_id, month_start, value
Record A, '2018-06-01', 1000
Record A, '2018-05-01', 900
Record A, '2018-04-01', 800
Record A, '2018-03-01', 700
Record A, '2018-02-01', 700
I am trying to write this query, I have something but know this is wrong as the value is summed up wrongly, please can someone help out ascertain how to get the correct values?

Try:
SELECT
record_id,
date_trunc('month', record_created_date)::date AS month_start,
value
FROM hist_temp
Output:
Record A 2018-06-01 1000
Record A 2018-04-01 900
Record A 2018-01-01 700
Record A 2018-03-01 800

Related

Filter two tables based on dates and sum the result in power bi

I have a problem summing the result of two tables. The first table has the sold quantities. The second table has the forecasted quantities. Both tables are linked to a calendar table (not represented). The third table has cut-off dates for both products.
Exported:
Product
Date
Quantity
A
1/1/2022
10
A
2/1/2022
10
A
3/1/2022
10
B
1/1/2022
5
B
2/1/2022
5
B
3/1/2022
5
Forecast:
Product
Date
Quantity
A
1/1/2022
20
A
2/1/2022
20
A
3/1/2022
20
A
4/1/2022
20
B
1/1/2022
15
B
2/1/2022
15
B
3/1/2022
15
B
4/1/2022
15
Cut Off Dates
Product
CutOffDate
A
2/1/2022
B
3/1/2022
The first goal is to filter both tables with the cut off date, getting from the first table and the product A & B:
Product
Date
Quantity
A
1/1/2022
10
A
2/1/2022
10
B
1/1/2022
5
B
2/1/2022
5
B
3/1/2022
5
Those dates are <=2/1/2022 for product A (The cut off date for product A) and <=3/1/2022 for product B (The cut off date for product B).
After that I need the same for table 2, but considering the dates after the cut off date:
Product
Date
Quantity
A
3/1/2022
20
A
4/1/2022
20
B
4/1/2022
15
Next, I need to mix both tables to obtain:
Product
Date
Quantity
A
1/1/2022
10
A
2/1/2022
10
A
3/1/2022
20
A
4/1/2022
20
B
1/1/2022
5
B
2/1/2022
5
B
3/1/2022
5
B
4/1/2022
15
Finally, my goal is to have the following result:
Date
Quantity
1/1/2022
15
2/1/2022
15
3/1/2022
25
4/1/2022
35
Thanks in advance!
I tried to do it using max for the dates, but I can't keep the Product context filter, causing that my table is only filtered by my maximum cut off date (3/1/2022)
You can use this Calculated Table to get your desired result:
Result =
VAR _tbl =
UNION(
FILTER(
Exported,
Exported[Date] <= RELATED('Cut Off Dates'[CutOffDate])
),
FILTER(
Forecast,
Forecast[Date] > RELATED('Cut Off Dates'[CutOffDate])
)
)
RETURN
GROUPBY(
_tbl,
[Date],
"Quantity", SUMX( CURRENTGROUP(), [Quantity])
)
but make sure you have a relation between your 3 input tables on the Product columns

Latest values by category based on a selected date

First, as I am a French guy, I want to apologise in advance for my poor English!
Despite my searches since few days, I cannot find the correct measure to solve my problem.
I think I am close to the solution, but I really need help to achieve this job!
Here is my need:
I have a dataset with a date table and a "Position" (i.e. "stock") table, which is my fact table, with date column.
Classic relationship between these 2 tables. Many Dates in "Position" table / 1 date un "Dates" table.
My "Dates" table has a one date per day (Column "AsOf")
My "Deals" table looks like this:
Id
DealId
AsOfDate
Notional
10000
1
9/1/2022
2000000
10001
1
9/1/2022
3000000
10002
1
9/1/2022
1818147
10010
4
5/31/2022
2000000
10011
4
5/31/2022
997500
10012
4
5/31/2022
1500000
10013
4
5/31/2022
1127820
10014
5
7/27/2022
140000
10015
5
7/27/2022
210000
10016
5
7/27/2022
500000
10017
5
7/27/2022
750000
10018
5
7/27/2022
625000
10019
1
8/31/2022
2000000
10020
1
8/31/2022
3000000
10021
1
8/31/2022
1801257
10022
1
8/31/2022
96976
10023
1
8/31/2022
1193365
10024
1
8/31/2022
67883
Based on a selected date (slicer with all dates from "Dates" table), I would like to calculate the sum of Last Notional for each "Deal" (column "DealId").
So, I must identify, for each Deal, the last "Asof Date" before or equal to the selected date and sum all matching rows.
Examples:
If selected date is 9/1/2022, I will see all rows, except rows asof date = 8/31/2022 for deal 1 (as the last date for this deal is 9/1/2022).
So, I expect to see:
DealId Sum of Notional
1 6 818 147
4 5 625 320
5 2 225 000
Grand Total 14 668 467
If I select 8/31/2022, total for Deal 1 changes (as we now take rows of 8/31 instead of 1/9):
DealId Sum of Notional
1 8 159 481
4 5 625 320
5 2 225 000
Grand Total 16 009 800
If I select 7/29, only deals 4 and 5 are active on this date, so the results should be:
DealId Sum of Notional
4 5 625 320
5 2 225 000
Grand Total 7 850 320
I think I found a solution for the rows, but my total is wrong (only notionals of the selected date are totalized).
I also think my measure is incorrect if I try to display the notional amounts aggregated by Rating (other column in my table) instead of deal.
Here is my measure:
Last Notional =
VAR SelectedAsOf =
SELECTEDVALUE ( Dates[AsOf] )
VAR LastAsofPerDeal =
CALCULATE (
MAX ( Deals[AsOf Date] ),
FILTER ( ALLEXCEPT ( Deals, Deals[DealId] ), Deals[AsOf Date] <= SelectedAsOf )
)
RETURN
CALCULATE (
SUM ( Deals[Notional] ),
FILTER (
ALLEXCEPT ( Deals, Deals[DealId]),
LastAsofPerDeal = Deals[AsOf Date]
)
)
I hope it is clear for you, and you will be able to find a solution for this.
Thanks in advance.
Antoine
Make sure you have no relationship between your calendar table and deals table like so.
Create a slicer with your dates table and create a table visual with deal id. Then add a measure to the table as follows:
Sum of Notional =
VAR slicer = SELECTEDVALUE(Dates[Date])
VAR tbl = FILTER(Deals,Deals[AsOfDate] <= slicer)
VAR maxBalanceDate = CALCULATE(MAX(Deals[AsOfDate]),tbl)
RETURN
CALCULATE(
SUM(Deals[Notional]),
Deals[AsOfDate] = maxBalanceDate
)

Power Pivot - calculating distinctcount per week (rather than per day)

I am having problems with a distinctcount calculated by week. I have the pivot table below. I want to calculate the distinct number of vendors that have sold more than $2400 per week.
I have the following data table "sales" (only the first rows, but it has several vendors and other weeks as well):
sales day sales week vendor ID Total Sales
02.11.2020 45 vendor 1 405
03.11.2020 45 vendor 1 464
04.11.2020 45 vendor 1 466
05.11.2020 45 vendor 1 358
06.11.2020 45 vendor 1 420
07.11.2020 45 vendor 1 343
I have tried to calculate it as such:
= [vendor] =distinctcount('Sales'[vendor ID])
= [Total_sales] = sum('Sales'[Total Sales])
= [# vendors - 2400] =calculate([vendor],filter('Sales',[Total_sales]>2400))
I know that this calculation considers the sales per day, not per week. so, if instead of using $2400 I used $300, for instance, then both vendors would be marked, since in at least one day, the sales of both are higher than $300. But I only want to consider the sales in a weekly basis.
What I expect (check pivot table below): Vendor 2 would be marked (sales = 2456), but not vendor 1 (sales = 1341), i.e., total number of vendors = 1. However, none of the vendors are being counted, since no daily sales are higher then $2400
Row Labels # Vendors (distinct) total sales
Store A 3797
week 45 3797
Vendor 1 1341
02.11.2020 348
04.11.2020 202
05.11.2020 335
06.11.2020 308
07.11.2020 148
Vendor 2 2456
02.11.2020 405
03.11.2020 464
04.11.2020 466
05.11.2020 358
06.11.2020 420
07.11.2020 343
I also tried to create a column of sales in which I removed the day filter, like this:
=calculate([total_sales],ALL('sales'[sales day]))
and then recalculated the [# vendors - 2400], but it still gets me the same result as above.
The question is: how do I get to consider the total sales value per week (and not per day) for the distinctcount. Thank you for the help!
Do you have a Date calendar in your file? if no try to make one, then have a relationship from date to sales day (assuming this has your dates). That way you should be able to summarize by any date grouping eg, Month, Day, Week, Quarter etc...Or you can try parsing the other date field and add new columns to your table = weeknum(Tablename[sales day])

DAX Grouping and Ranking in Calculated Columns

My raw data stops at sales - looking for some DAX help adding the last two as calculated columns.
customer_id order_id order_date sales total_sales_by_customer total_sales_customer_rank
------------- ---------- ------------ ------- ------------------------- ---------------------------
BM 1 9/2/2014 476 550 1
BM 2 10/27/2016 25 550 1
BM 3 9/30/2014 49 550 1
RA 4 12/18/2017 47 525 3
RA 5 9/7/2017 478 525 3
RS 6 7/5/2015 5 5 other
JH 7 5/12/2017 6 6 other
AG 8 9/7/2015 7 7 other
SP 9 5/19/2017 26 546 2
SP 10 8/16/2015 520 546 2
Lets start with total sales by customer:
total_sales_by_customer =
var custID = orders[customer_id]
return CALCULATE(SUM(orders[sales], FILTER(orders, custID = orders[customer_id]))
first we get the custID, filter the orders table on this ID and sum it together per customer.
Next the ranking:
total_sales_customer_rank =
var rankMe = RANKX(orders, orders[total_sales_by_customer],,,Dense)
return if (rankMe > 3, "other", CONVERT(rankMe, STRING))
We get the rank per cust sales (gotten from first column), if it is bigger than 3, replace by "other"
On your first question: DAX is not like a programming language. Each row is assessed individual. Lets go with your first row: your custID will be "BM".
Next we calculate the sum of all the sales. We filter the whole table on the custID and sum this together. So in the filter we have actualty only 3 rows!
This is repeated for each row, seems slow but I only told this so you can understand the result you are getting back. In reality there is clever logic to return data fast.
What you want to do "Orders[Customer ID]=Orders[Customer ID]" is not possible because your Orders[Customer ID] is within the filter and will run with the rows..
var custid = VALUES(Orders[Customer ID]) Values is returning a single column table, you can not use this in a filter because you are then comparing a cell value with a table.

Select most recent rows in Django ORM with grouping

We have a system written in Django to track patients recruited to clinical trials.
Spread sheets are used to record the number of patients recruited each month throughout a financial year; so the sheet only contains 12 months of data even though a study may run for years.
There is a table in a django database in to which the spread sheets are imported each month. The data includes the month/year, a count of patients, and some other fields. Each import will include all the previous months data; we need this to make sure no data has been changed on the import sheet since the last import.
For example, the import table containing two imports (the first up to January and the second up to February) would look like this:
id | study_id | data_date | patient_count | [other fields] -->
100 5456 2016-04-01 10 ...
101 5456 2016-05-01 8 ...
102 5456 2016-06-01 5 ...
... all months in between ...
109 5456 2016-01-01 12 ...
110 5456 2016-02-01 NULL ...
111 5456 2016-03-01 NULL ...
112 5456 2016-04-01 10 ...
113 5456 2016-05-01 8 ...
114 5456 2016-06-01 5 ...
... all months in between ...
121 5456 2016-01-01 12 ...
122 5456 2016-02-01 6 ...
123 5456 2016-03-01 NULL ...
The other fields includes a foreign key to another table containing the actual study identification number (iras_number), so I have to join to that to select the rows for a particular study.
I want the most recent values of data_date and patient_count for a study, which may span more than one financial year, so I tried this query (iras_number is passed to the function performing this query):
totals = ImportStudyData.objects.values('data_date', 'patient_count') \
.filter(import_study__iras_number=iras_number) \
.annotate(max_id=Max('id')).order_by()
However, this produces a SQL query which includes patient_count in the GROUP BY, resulting in duplicate rows:
data_date | patient_count | max_id
2016-04-01 10 100
2016-04-01 10 112
2016-05-01 8 101
2016-05-01 8 113
...
2016-01-01 12 109
2016-01-01 12 121
2016-02-01 NULL 110
2016-02-01 6 122
How do I select the most recent data_date and patient_count from the table using the ORM?
If I were writing the SQL I would do an inner select of the max(id) grouped by data_date and then use that to join, or use an IN query, to select the fields I require from the table; such as:
SELECT data_date, patient_count
FROM importstudydata
WHERE id IN (
SELECT MAX(id) AS "max_id"
FROM importstudydata INNER JOIN importstudy
ON importstudydata.import_study_id = importstudy.id
WHERE importstudy.iras_number = 5456
GROUP BY importstudydata.data_date
)
ORDER BY data_date ASC
I've tried to create an inner select to replicate the SQL query, however the inner select returns more than one field (column) a causes the query to fail:
totals = ImportStudyData.objects.values('data_date', 'patient_count') \
.filter(id__in=ImportStudyData.objects.values('data_date') \
.filter(import_study__iras_number=iras_number) \
.annotate(max_data_id=Max('id'))
Now I can't get the inner select to return only the max(id) grouped by `data_date' and for it to be performed in a single SQL query.
For now I'm splitting the query in to a number of steps to get the result I want.
First I query for the most recent id of all rows related to the study
id_qry = ImportStudyData.objects.values('data_date')\
.filter(import_study__iras_number=iras_number)\
.annotate(max_id=Max('id'))
To get a list of just the numbers, stripping out the date, I use list comprehension:
id_list = [x['max_id'] for x in id_qry]
This list is then used as a filter for the final query to get the number of patients
totals = ImportStudyData.objects.values('data_date', 'patient_count') \
.filter(id__in=id_list)
It hits the database twice, and is computationally more expensive, but for now it works and I need to move on.
I'll come back to this problem at a later date.
Use: distinct=True
totals = ImportStudyData.objects.values('data_date', 'patient_count').filter(import_study__iras_number=iras_number).annotate(max_id=Max('id')).order_by('data_date').distinct()