Django ORM: Joining a table to itself and aggregating - django

I have a table in my Django app, UserMonthScores, where every user has a "score" for every month. So, it looks like
userid | month | year | score
-------+-------+------+------
sil | 9 | 2014 | 20
sil | 8 | 2014 | 20
sil | 7 | 2014 | 20
other | 9 | 2014 | 100
other | 8 | 2014 | 1
I'd like to work out which position a specific user was in, for each month, in the ranking table. So in the above, if I ask for monthly ranking positions for user "sil", per month, I should get a response which looks like
month | year | rank
------+------+-----
9 2014 2 # in second position behind user "other" who scored 100
8 2014 1 # in first position ahead user "other" who scored 1
7 2014 1 # in first position because no-one else scored anything!
The way I'd do this in SQL is to join the table to itself on month/year, and select rows where the second table was for the specific user and the first table had a larger score than the second table, group by month/year, and select the count of rows per month/year. That is:
select u1.month,u1.year,count(*) from UserMonthScores u1
inner join UserMonthScores u2
on u1.month=u2.month and u1.year=u2.year
and u2.userid = 'sil' and u1.score >= u2.score
group by u1.year, u1.month;
That works excellently. However, I do not understand how to do this query using the Django ORM. There are other questions about joining a table to itself, but they don't seem to cover this use case.

Related

Grouping days in order to see the number of items sold using DAX in power BI

My data looks like this :
Item | Packaged Date | Delivery Date | Days took
1 | 17-05-2019 | 19-05-2019 | 2
2 | 23-05-2019 | 24-05-2019 | 1
3 | 22-05-2019 | 30-05-2019 | 8
I want to make a table using DAX where i have two columns
Number of Days | Items
0-5 | 2
5-10 | 1
This basically means within 5 days, 2 items in total were sold
and within 5 or 5-10 days , 1 item was sold
I found a way using DAX expression to solve the my own Question.
I created a DAX Query :
AggregatedDays = IF(Dates[Days]<=5 && Dates[Days]>=0 , "0-5 Days","5-10 Days")
A new table is created using Aggregated Days column and Items with items being "sum" from the "VALUES" table.

How to append current and previous sessions side by side filtered by two independent slicers

Objective: I would like obtain the difference between current and previous sessions based on date slicers
I want the output to be 4 columns as such:
Date
Current Sessions (see measure below)
Previous Sessions (see measure below)
Difference (no measure calculated yet).
Situation:
I currently have two measures
Current Sessions: SUM(Sales[Sessions])
Previous Sessions (thanks to #Alexis Olson):
VAR datediffs = DATEDIFF(
CALCULATE (MAX ( 'Date'[Date] ) ),
CALCULATE (MAX ('Previous Date'[Date])),
DAY
)
RETURN
CALCULATE(SUM(Sales[Sessions]),
USERELATIONSHIP('Previous Date'[Date],'Date'[Date]),
DATEADD('Date'[Date],datediffs,DAY)
)
I have three tables.
Sales
Date
Previous Date (carbon copy of Date table)
My previous date table is 1:1 inactive relationship with the Date table. Date table is 1 to many active relationship
with my Sales Table.
I have two slicers at all time comparing the same amount of days from different time periods (e.g. Jan 1th to Jan 7th 2019 vs Dec 25st to Dec 31th 2019)
If i put current sessions, previous sessions and a date column from any of the three tables
+----------+------------------+-------------------+------------+
| date | current sessions | previous sessions | difference |
+----------+------------------+-------------------+------------+
| Jan 8th | 10000 | 70000 | 3000 |
| Jan 9th | 20000 | 10000 | 10000 |
| Jan 10th | 15000 | 16000 | -1000 |
| Jan 11th | 14000 | 12000 | 2000 |
| Jan 12th | 12000 | 14000 | -2000 |
| Jan 13th | 11000 | 16000 | -5000 |
| Jan 14th | 15000 | 18000 | -3000 |
+----------+------------------+-------------------+------------+
When I put the Sessions date on the table along with sessions and previous sessions, I get the sessions amounts right for each day but the previous session amounts doesn't calculate correctly I assume because its being filtered by the date rows.
How can I override that table filter and force it to get the exact previous sessions amounts? Basically have both results appended to each other.The following shows my problem. the previous session is the same on each day and is basically the amount of dec 31st jan 2018 because the max date is different for each row but I want it to be based on the slicer.
The mistake came in the first part of the VAR Datediffs variable within the previous session formula:
CALCULATE(LASTDATE('Date'[Date]),ALLSELECTED('Date'))
This forces to always calculate the last day for each row and overrides the date value in each row.

Summarizing row based data over 2 separate columns within the same table

I've got the following table in Power BI:
Date | PersonID | Hours | Age
------------------------------|------
02-jan-18 | 4 | 8 | 3
06-jan-18 | 4 | 6 | 3
01-feb-18 | 4 | 6 | 3
05-feb-18 | 4 | 4 | 4
01-jan-18 | 5 | 6 | 3
01-feb-18 | 5 | 6 | 3
I have rows of data up until a few years back for multiple PersonID's. Most people have multiple rows per month because the data is split out on separate days. For every date, I have that person's age at the time (in this case, PersonID "4" had a birthday between feb 1st and feb 5th).
What I want to do is calculate the amount of hours PER MONTH, PER AGE. My end result should look something like this (average hours per month shown per age):
Age | Average hours per month
----------------------------------
1 | 35
2 | 31
3 | 28
4 | 28
I have no idea how to get started. How can I calculate a sum over 2 columns?
First, create a column on your table that will allow you to group by month:
MonthYear = EOMONTH(HoursAge[Date], 0)
Now you can write a measure that takes an average over a summarized table:
AvgHoursPerMonth = AVERAGEX(
SUMMARIZE(HoursAge,
HoursAge[MonthYear],
HoursAge[Age],
"MonthHours", SUM(HoursAge[Hours])),
[MonthHours])
Here's what the summarized table looks like for your given example:
This would give the following result when you put the measure into a table with age on the rows:
Age | AvgHoursPerMonth
----|-----------------
3 | 16
4 | 4

Sum distinct values for first occurance in Power BI

In Power BI I have some duplicate entries in my data that only have 1 column that is different, this is a "details" column.
Name | Value | Details
Item 1 | 10 | Feature 1
Item 1 | 10 | Feature 2
Item 2 | 15 | Feature 1
Item 3 | 7 | Feature 1
Item 3 | 7 | Feature 2
Item 3 | 7 | Feature 3
I realize this is an issue with the data structure, but it cannot be changed.
Basically, when I sum up my Value column on a Power BI card, I only want it to sum for each unique name, so in this case:
Total = 10 + 15 + 7
I will be using the details in a matrix, so I cannot simply remove the duplicates from within the Query Editor.
Is there any way I can filter this with a DAX formula? Just summing the first occurrence of an item?
You can create a measure as follows:
Total = SUMX(DISTINCT(Data[Name]), FIRSTNONBLANK(Data[Value], 0))
It will return the first non-blank Value for all distinct Name and sum it up.
Results:
This must help
Table = SUMMARIZE(Sheet2,Sheet2[Item],"Sales Quantity",SUM(Sheet2[Sales Quantiy]),"Purchase Quantity",CALCULATE(SUMX(DISTINCT(Sheet2[Purchase Quantity]),FIRSTNONBLANK(Sheet2[Purchase Quantity],0))))

Stata Collapsing by first observation date when there are multiple date observations per ID

I am working with a dataset that has purchases per date (called ItemNum) on multiple dates across 2800 individuals. Each Item is given its own line, so if an individual has purchased two items on a date, that date will appear twice. I don't care how many items were purchased on a date (with each date representing one trip), but rather the mean number of trips made across the 2800 individuals (For about 18230 lines of data). My data looks like this:
+---+----------+-------+---------------------- ---+
|ID | Date |ItemNum| ItemDescript |
| 1 |01/22/2010| 1 |Description of the item |
| 1 |01/22/2010| 2 |Description of other item |
| 1 |07/19/2013| 1 | |
| 2 |06/04/2012| 1 | |
| 2 |02/02/2013| 1 | |
| 2 |11/13/2013| 1 | |
+---+----------+-------+---------------------- ---+
In the above table, person 1 made two trips and three item purchases (because two dates are shown), person 2 made three trips. I am interested in the average number of trips across all people, but first I need to collapse it down to unique dates. So I know I need to collapse on the date, but when I do
collapse (mean) ItemNum (first) Date, by(ID)
it just takes the first date that the ID shows up, not the first occurrence of each unique date.
The next issue is that once it's collapsed, I need to take the mean of the count of the dates, not the date itself, which is also where I seem to be getting tripped up.
Or perhaps something like
clear
input ID str16 dt ItemNum
1 "01/22/2010" 1
1 "01/22/2010" 2
1 "07/19/2013" 1
end
generate Date = daily(dt,"MDY")
egen trip = tag(ID Date)
collapse (sum) trip, by(ID)
summarize trip
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
trip | 1 2 . 2 2
if what you are looking for is found in "Mean" - a single number giving the average number of trips made by the 2800 individuals (1 individual with the limited sample data given).
are you trying to do the following?
collapse (mean) ItemNum, by(ID Date) fast