django aggregate sum of subgroup maximums - django

I have the following simplified setup in Django, using MySQL:
class Category(models.Model):
name = models.CharField(max_length=255)
class MyTable(models.Model):
category = models.ForeignKey('Category', on_delete=models.CASCADE)
date = models.DateField()
amount = models.IntegerField()
class Meta:
unique_together = ['category', 'date']
And a query set comprised of the following sample data for MyTable:
date category_id amount
2017-12-01 3 2
2018-01-01 1 100
2018-02-01 1 50
2018-03-01 2 2000
2018-04-01 2 4000
2018-05-01 3 2
2018-06-01 3 1
What I ultimately want is a way to get the sum of the amounts corresponding to the latest date for each category. To illustrate:
The latest date for category_id 1 is 2018-02-01, where the amount is 50;
The latest date for category_id 2 is 2018-04-01, where the amount is 4000;
The latest date for category_id 3 is 2018-06-01, where the amount is 1;
50 + 4000 + 1 = 4051
I'm trying to figure how to get the value 4051 through an aggregate call to this queryset. I've tried every combination of "values", "annotate" and "aggregate" I could think of and nothing gets the desired result. The following gets me the latest date for each category, but every time I try to get the sum of the corresponding amounts, it calculates it on every amount instead of just the maximums.
MyTable.objects.values('category').annotate(Max('date'))
Is what I'm trying to do possible through Django's ORM? I posted another question about what the MySQL syntax would be for this exact example, but can't get that to apply to a Django query set either.
Any guidance appreciated.

i think you need some data structures to achieve this please take a look at the below answer this if Djnago ORM or sql does not meet your requirement(unable to write)
l=[]
x=MyTable.objects.values('category').annotate(date=Max('date'))
for y in x:
qq= MyTable.objects.get(category_id=y.get(category),date=y.get(date))
l=l+qq.amount
total_amount = sum(l)
print(total_amount)

Related

Counting the number of time a dynamic category appeaed

I hope you are all doing well,
I am stuck at a problem and can't find a ways out. I have a sample table with shops name, target and date. Based on the target a dax measure is written which which will define the category for that particular shop.
for example if I select date between 1 december, 20222 to 2 decemeber 2022 than the category of  a1 store will be "D" however if choose date upto 3 december , 2022 the category of a1 store will be "A".   Following dax is used to assign the category.
Category =
var t1 = sum('Table'[Target])
return
if(t1 >=15 ,"A", if(t1<15 && t1>=10, "B", if(t1>10 && t1<=5, "C","D")))
Now I want to count the total number of stores fall in a specefic category based show the total against those stores with respect to the chosen date and show blank in total for category and for count. 
The output should be like this.
Kindly help
To create dax measure based on my requirements mentioned before.

Power BI - Filtering model on latest version of all attributes of all dimensions through DAX

I have a model that's comprised of multiple tables containing, for every ID, multiple rows with a valid_from and valid_to dates.
This model has one table in that is linked to every other table (a table working as both a fact and a dimension).
This fact has bi-directional cross filtering with the other tables.
I also have a date dimension that is not linked to any other table.
I want to be able to calculate the sum of a column in this table in the following way:
If a date range is selected, I want to get the sum of the latest value per ID from the fact able that is before the max selected date from the date dimension.
If no date is selected, I want to get the sum of the current version of the value per ID.
This comes down to selecting the latest value per ID filtered on the dates.
Because of the nature of the model (bi-directional with the fact/dimension table), I want to have the latest version of any attribute from any dimension selected in the visual.
Here's an data example and the desired outcome:
fact/dimension table:
ID
Valid_from
Valid_to
Amount
SK_DIM1
SK_DIM2
1
01-01-2020
05-12-2021
50
1234
6787
1
05-13-2021
07-31-2021
100
1235
6787
1
08-01-2021
12-25-2021
100
1236
6787
1
12-26-2021
12-31-2021
200
1236
6787
1
01-01-2022
12-31-9999
200
1236
6788
Dimension 1:
ID
SK
Valid_from
Valid_to
Name
1
1234
10-20-2019
06-01-2021
Name 1
1
1235
06-02-2021
07-31-2021
Name 2
1
1236
08-01-2021
12-31-9999
Name 3
Dimension 2:
ID
SK
Valid_from
Valid_to
Name
1
6787
10-20-2019
12-31-2021
Name 1
1
6788
01-01-2022
12-31-9999
Name 2
My measure is supposed to do the following:
If no date is selected than the result will be a matrix like the following:
Dim 1 Name
Dim 2 Name
Amount Measure
Name 3
Name 2
200
If July 2021 is selected than the result will be a matrix like the following:
Dim 1 Name
Dim 2 Name
Amount Measure
Name 2
Name 1
100
So the idea here is that the measure would filter the fact table on the latest valid value in the selected date range, and then the bi-directional relationships will filter the dimensions to get the corresponding version to that row with the max validity (last valid row) in the selected range date.
I have tried to do the following two DAX codes but it's not working:
Solution 1: With this solution, filtering on other dimensions work and I get the last version in the selected date range for all attributes of all used dimensions. But the problem here is that the max valid from is not calculated per ID, so I only get the max valid from overall.
Amount Measure=
VAR _maxSelectedDate = MAX(Dates[Dates])
VAR _minSelectedDate = MIN(Dates[Dates])
VAR _maxValidFrom =
CALCULATE(
MAX(fact[valid_from]),
DATESBETWEEN(fact[valid_from], _minSelectedDate, _maxSelectedDate)
|| DATESBETWEEN(fact[valid_to], _minSelectedDate, _maxSelectedDate)
)
RETURN
CALCULATE(
SUM(fact[Amount]),
fact[valid_from] = _maxValidFrom
)
Solution 2: With this solution, I do get the right max valid from per ID and the resulting number is correct, but for some reason, when I use other attributes from the dimensions, it duplicates the amount for every version of that attribute. The bi-directional filtering does not work anymore with Solution 2.
Amount Measure=
VAR _maxSelectedDate = MAX(Dates[Dates])
VAR _minSelectedDate = MIN(Dates[Dates])
VAR _maxValidFromPerID =
SUMMARIZE(
FILTER(
fact,
DATESBETWEEN(fact[valid_from], _minSelectedDate, _maxSelectedDate)
|| DATESBETWEEN(fact[valid_to], _minSelectedDate, _maxSelectedDate)
),
fact[ID],
"maxValidFrom",
MAX(fact[valid_from])
)
RETURN
CALCULATE(
SUM(fact[Amount]),
TREATAS(
_maxValidFromPerID,
fact[ID],
fact[valid_from]
)
)
So if somebody can explain why the bi-directional filtering doesn't work anymore that will be great, and also, more importantly, if you have any solution to have both the latest value per ID and still keep filtering on other attributes, that would be great!
Sorry for the long post, but I thought it's best to give all the details for a complete understanding of my issue, this has been picking my brain since few days now and I'm sure I'm missing something stupid but I turned to this community for help because I cannot seem to be able to find a solution!
Thank you very much in advance for any help!
Seems to be workable with a dummy model. I didn't got the point how filter ID, so if it creates a problem let me know how you handle ID. Then I changed fact to facts as fact is a function. Also, I'm not sure about the workability of the measure at your real model. Hope you will give some feedback.
Amount Measure =
VAR ValidDate=
calculate(
max(facts[Valid_to])
,ALLEXCEPT(facts,facts[ID])
,facts[Valid_to]<=MAX(Dates[Date])
)
Return
CALCULATE(
SUM(facts[Amount])
,TREATAS({ValidDate},facts[Valid_to])
)

Django ORM - INNER JOIN to a GROUP BY subquery on the same table using PostgreSQL

I've got a table that looks something like this (I've omitted some columns):
ID
Key_ID
Project_ID
other columns
1
1
123456
a
2
1
123456
b
3
2
123456
c
4
2
123456
d
5
3
654321
e
6
3
654321
f
7
4
654321
g
8
4
654321
h
Using Django ORM, I would need to get the equivalent of this query:
SELECT * FROM table AS t
INNER JOIN (
SELECT MAX(ID) AS max_ID FROM table
GROUP BY Key_ID
WHERE Proeject_ID = 123456
) AS sub_query
ON t.ID = sub_query.max_ID
I've tried some aggregate and annotate combinations but I don't seem to be able to achieve the GROUP BY in the subquery. If I could do that I could then try to use a .filter(id__in=<subuery_result> so effectively use a SELECT ... WHERE ID IN <subquery_result although the INNER JOIN would be ideal as the subquery result could be quite large.
UPDATE:
The database I use is PostgreSQL and the accepted answer only works with this.
This is the actual model:
class SystemKey(models.Model):
# The ID (primary key) is handled by Django.
key_id = models.PositiveIntegerField(
help_text="Unique key ID from System."
)
project = models.ForeignKey(
"core.SystemProject",
on_delete=models.PROTECT,
help_text="System project that this key belongs to.",
)
# There are a whole bunch of other properties here
record_created = models.DateTimeField(
auto_now_add=True,
help_text="Date & time when this record was added in the database.",
)
record_updated = models.DateTimeField(
auto_now=True,
help_text="Date & time when this record was updated in the database.",
)
You can make subsequent calls to order_by and distinct while passing fields to distinct to effectively achieve what you want, with the single caveat being this is only possible for PostgreSQL:
SystemKey.objects.filter(project_id=123456).order_by('key_id', '-id').distinct('key_id')
Basically here we select entries having distinct key_id and since we ordered by id in descending order we get only the entries with max id for each key_id.
You can use ‘raw()’ through model.objects.raw().
What about
Model.objects.filter(projectid=123456).latest(Max_ID)

Django ORM: Get the last item from an update set based on filters and ordered

I have a table:
`UPDATES` TABLE
product_id | sales_rank | updated_on
____________________________________________
1 500 2015-12-17 15:19
2 600 2015-12-17 15:20
3 700 2015-12-17 15:21
1 550 2015-12-18 15:19
2 550 2015-12-18 15:20
3 1000 2015-12-18 15:21
I have another relating table:
`PRODUCTS` TABLE
product_id | title | brand | picture
__________________________________________
1 iPod Apple http://cdn...
2 iPad Apple http://cdn...
3 iPhone Apple http://cdn...
A user should be able to search specific product traits and also retrieve the latest update. For example, if I search for title 'iPhone', I should receive row 6 from the UPDATES table. If I search for brand 'Apple', I should receive rows 4,5,and 6 from the UPDATES table.
What I've been doing is simply doing a group by on product_id and selecting 'max(updated_on)' to get the list of products. However, there is a problem here, because I want to also order by the sales rank...
The database I'm working with has 200,000 products and over 3 million update rows at the moment. I need to return the latest 500 update rows (remember, ordered by sales rank, too) that can satisfy certain traits.
I'm pretty lost and may be over thinking it but would be grateful for any help.
Thanks

django : Model filter on date ranges

How to get model filter on date ranges.
In my project using employee doj, i need to deign a table.
like employee who have joined less than three months , 3-6month, 6-12 months, 12-24 months.
Depart < 3month 3-6months 6-12months 12-24months
----------------------------------------- ---- -----
A dep 4 6 6 8
------------------------------------------------------
How to do this filter in django.
I have gone through this link ,but its confusing .
http://www.nerdydork.com/django-filter-model-on-date-range.html
Thanks in Advance
The range filter works like this:
MyModel.objects.filter(date__range=(range_start, range_end))
You can use that in conjunction with datetime.timedelta to get month periods. For example:
from datetime import datetime, timedelta
now = datetime.now()
# <3 months
three_months_ago = now - timedelta(months=3)
MyModel.objects.filter(date__range=(three_months_ago, now))