I have a query in MySql that I need translated into Django ORM. It involves joining on two tables with two counts on one of the tables. I'm pretty close to it in Django but I get duplicate results. Here's the query:
SELECT au.id,
au.username,
COALESCE(orders_ct, 0) AS orders_ct,
COALESCE(clean_ct, 0) AS clean_ct,
COALESCE(wash_ct, 0) AS wash_ct
FROM auth_user AS au
LEFT OUTER JOIN
( SELECT user_id,
Count(*) AS orders_ct
FROM `order`
GROUP BY user_id
) AS o
ON au.id = o.user_id
LEFT OUTER JOIN
( SELECT user_id,
Count(CASE WHEN service = 'clean' THEN 1
END) AS clean_ct,
Count(CASE WHEN service = 'wash' THEN 1
END) AS wash_ct
FROM job
GROUP BY user_id
) AS j
ON au.id = j.user_id
ORDER BY au.id DESC
LIMIT 100 ;
My current Django query (which brings back unwanted duplicates):
User.objects.annotate(
orders_ct = Count( 'orders', distinct = True )
).annotate(
clean_ct = Count( Case(
When( job__service__exact = 'clean', then = 1 )
) )
).annotate(
wash_ct = Count( Case(
When( job__service__exact = 'wash', then = 1 )
) )
)
The above Django code produces the following query which is close but not right:
SELECT DISTINCT `auth_user`.`id`,
`auth_user`.`username`,
Count(DISTINCT `order`.`id`) AS `orders_ct`,
Count(CASE
WHEN `job`.`service` = 'clean' THEN 1
ELSE NULL
end) AS `clean_ct`,
Count(CASE
WHEN `job`.`service` = 'wash' THEN 1
ELSE NULL
end) AS `wash_ct`
FROM `auth_user`
LEFT OUTER JOIN `order`
ON ( `auth_user`.`id` = `order`.`user_id` )
LEFT OUTER JOIN `job`
ON ( `auth_user`.`id` = `job`.`user_id` )
GROUP BY `auth_user`.`id`
ORDER BY `auth_user`.`id` DESC
LIMIT 100
I could probably achieve it by doing some raw sql subqueries but I would like to remain as abstract as possible.
Based on this answer, you can write:
User.objects.annotate(
orders_ct = Count( 'orders', distinct = True ),
clean_ct = Count( Case(
When( job__service__exact = 'clean', then = F('job__pk') )
), distinct = True ),
wash_ct = Count( Case(
When( job__service__exact = 'wash', then = F('job__pk') )
), distinct = True )
)
Table (after joins):
user.id order.id job.id job.service your case/when my case/when
1 1 1 wash 1 1
1 1 2 wash 1 2
1 1 3 clean NULL NULL
1 1 4 other NULL NULL
1 2 1 wash 1 1
1 2 2 wash 1 2
1 2 3 clean NULL NULL
1 2 4 other NULL NULL
Desired output for wash_ct is 2. Counting distinct values in my case/when, we will get 2.
I think this will work, the chained annotation of job might have produced duplicate users.
If not can you elaborate on duplicates you are seeing.
User.objects.annotate(
orders_ct = Count( 'orders', distinct = True )
).annotate(
clean_ct = Count( Case(
When( job__service__exact = 'clean', then = 1 )
) ),
wash_ct = Count( Case(
When( job__service__exact = 'wash', then = 1 )
) )
)
Try adding values(), also when distinct=True you can combine Count()'s in one annotation().
Users.objects.values("id").annotate(
orders_ct = Count('orders', distinct = True)
).annotate(
clean_ct = Count(Case(When(job__service__exact='clean', then=1)),
distinct = True),
wash_ct = Count(Case(When(job__service__exact='wash',then=1)),
distinct = True)
).values("id", "username", "orders_ct", "clean_ct", "wash_сt")
Using values("id") should add GROUP BY 'id' for annotations and therefore prevent duplicates, see docs.
Also, there's Coalesce, but it doesn't look like it's needed, since Count() returns int anyway. And distinct, but again the distinct in Count() should be enough.
Not sure if Case needed inside Count() as it should count them anyway.
Related
I am looking for a DAX measure to solve the following problem:
Count the number of rows in the dimension table where the Fact table either has no rows or the score is 0.
Table A (Dimension Table)
ID
name
1
a
2
b
3
c
Table B (Fact Table)
ID
score
1
0
1
1
1
2
2
5
Expected Result
In this example, I would expect 2, as ID=1 has one row with score=0 and ID=3 as no corresponding row in the Fact Table.
I came up with this measure which gives me the number of rows that have no corresponding row in the fact table, but I am not able to integrate the first condition:
CALCULATE(COUNTROWS('Dimension'), FILTER ('Dimension', ISBLANK ( CALCULATE ( COUNT ('Fact'[id]) ) )))
Probably much more straightforward methods, but try this measure for now:
MyMeasure =
VAR MyTable =
ADDCOLUMNS(
Table_A,
"Not in Table_B", NOT (
Table_A[ID]
IN DISTINCT( Table_B[ID] )
),
"Zero Score in Table_B",
CALCULATE(
COUNTROWS( Table_B ),
Table_B[score] = 0
) > 0
)
RETURN
SUMX(
MyTable,
[Not in Table_B] + [Zero Score in Table_B]
)
You can also try this
CountID =
VAR ScoreZero =
COUNTROWS ( FILTER ( TableB, [score] = 0 ) )
VAR NonExistentIDs =
COUNTROWS ( EXCEPT ( DISTINCT ( TableA[ID] ), DISTINCT ( TableB[ID] ) ) )
RETURN
ScoreZero + NonExistentIDs
This also works, not sure it's a good idea to nest CALCULATE:
CALCULATE(COUNTROWS('Table_A'), FILTER ('Table_A', ISBLANK ( CALCULATE ( COUNT ('Table_B '[id]) ) ) || CALCULATE(COUNTAX(Filter('Table_B ','Table_B '[score]=0),'Table_B '[id])>=1)))
I have a problem converting below t-sql query into DAX.
Overview - There are two sample tables - Table1 and Table2 with below schema
Table1 (ID varchar(20),Name varchar(30))
Table2 (CapID varchar(20),CAPName varchar(30), CapID_Final varchar(20))
Please note : There exists one to many relationship between above tables : [ID] in Table2 with [CapID] in Table1
I am trying to derive CapID_Final column in table2 based on conditions as per my t-SQL query in below which works perfectly fine -
SELECT CASE
WHEN [CapID] like 'CA%' and [CAPName]='x12345-Sample'
and [CapID] not in(select [ID] from Table1 where Name='x12345-Sample')
THEN 'Undefined_Cap_1'
WHEN [CapID] like 'CA%' and [CAPName]='z12345-Sample'
and [CapID] not in(select [ID] from Table1 where Name='z12345-Sample')
THEN 'Undefined_Cap_2'
WHEN [CapID] like 'CA%' and [CAPName]='a123-Sample'
and [CapID] not in(select [ID] from Table1 where Name='a123-Sample')
THEN 'Undefined'
ELSE [CapID]
END AS [CapID_Final] from Table2
However, I want the same derivation for CapID_Final column in Power BI in a calculated column using DAX.
So far, I have tried below code - but it returns "Undefined" for even matched conditions -
CapID_Final =
IF(LEFT(Table2[CapID],2)="CA" && Table2[CAPName]="z12345-Sample" &&
NOT
(COUNTROWS (
FILTER (
Table1,CONTAINS(Table1,Table1[ID],Table2[CapID])
)
) > 0),"Undefined_Cap_1","Undefined"
)
I am not familiar with DAX, however I tried and couldn't figure it out.
Could you please let me know how to convert my sql query to equivalent DAX in Power BI?
A SWITCH is basically the equivalent of a CASE clause here:
CapID_Final =
SWITCH (
TRUE (),
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "x12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "x12345-Sample" )
), "Undefined_Cap_1",
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "z12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "z12345-Sample" )
), "Undefined_Cap_2",
LEFT ( Table2[CapID], 2 ) = "CA"
&& Table2[CAPName] = "a12345-Sample"
&& NOT (
Table2[CapID]
IN CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = "a12345-Sample" )
), "Undefined",
Table1[CapID]
)
You might even be able to refactor it a bit to be more code efficient. Assuming I didn't make any logic mistakes:
CapID_Final =
VAR IDs =
CALCULATETABLE ( VALUES ( Table1[ID] ), Table1[Name] = Table2[CAPName] )
RETURN
IF (
LEFT ( Table2[CapID], 2 ) = "CA"
&& NOT ( Table2[CapID] IN IDs ),
SWITCH (
Table2[CAPName],
"x12345-Sample", "Undefined_Cap_1",
"z12345-Sample", "Undefined_Cap_2",
"a12345-Sample", "Undefined"
),
Table1[CapID]
)
As a best-practice never use calculated column. In fact, if extensively used they slow down your model refresh and heavily increase your model weight (because they are not compressed). Instead, calculate it in your back-end database or using M Query.
Having said this, the solution to your question is very simple using a SWITCH function:
SWITCH ( <Expression>, <Value>, <Result> [, <Value>, <Result> [, … ] ] [, <Else>] )
In your case would be as follow:
CapIDFinal:=
SWITCH(TRUE(),
AND(CONDITION_1_1, CONDITION_1_2), "Value if condition 1 is true",
AND(CONDITION_2_1, CONDITION_2_2), "Value if condition 2 is true",
"Value if none of above conditions is true
)
Let's assume I have the below dataset.
What I need to create the below matrix where if it is the beginning or month end, I aggregate A or B in Category 1 and calculate SUM but if it is any other day in a month but 1st or last, I am tagging A or B in Category 2 and calculate SUM. I guess I need to use SWITCH, don't I?
Edit in info from comments
Like to create 3 col:
isStart = IF ( main_table[date] = STARTOFMONTH ( main_table[date] ), 1, 0 )
isEnd = IF ( main_table[date] = ENDOFMONTH ( 'main_table'[date] ), 1, 0 )
in_between_date =
IF ( AND ( main_table[date] <> ENDOFMONTH ( 'main_table'[date] ),
main_table[date] <> STARTOFMONTH ( main_table[date] ) ), 1, 0 )
Then, create the columns with my categories, like
start_end =
IF ( OR ( NOT ( ISERROR ( SEARCH ( "A", main_table[code] ) ) ),
main_table[code] = "B" ),
"Category 1",
BLANK () )
and
in_between =
IF ( OR ( main_table[code] = "B", main_table[code] = "A" ), "Category 2", BLANK () )
But then, what should I use in switch/if ? = if(VALUES('main_table'[isStart]) = 1, then what?
You where on the right track but overcomplicated a bit. You only need one extra column "Category" giving for each row in what category the item falls.
Category =
IF (
startEnd[date] = STARTOFMONTH ( startEnd[date] )
|| startEnd[date] = ENDOFMONTH ( startEnd[date] );
"Category1";
"Category2"
)
table end result is:
I am trying to get all the last records based on the id, sorted by the month.
this gives me the last record,
qs = Cashflow.objects.filter ( acct_id = a.id ).order_by('-month')[:1]
And this groups the accounts,
qs = Cashflow.objects
.filter ( acct_id = a.id )
.values ( 'acct_id' )
.annotate ( count = Count ( 'acct_id' ) )
.values ( 'acct_id', 'count' )
.order_by ( )
How how can I combine the two queries into one?
Group by acct_id, sort by "month" and get last record.
is this even possible? thanks
EDIT:
this is the sql version of what I am trying to do.
select *
from cashflow t
inner join (
select acct_id, max ( `month` ) as MaxDate
from cashflow
where acct_id IN ( 1,2,3,... )
group by acct_id
) tm on t.acct_id = tm.acct_id and t.month = tm.MaxDate
order by acct_id
Can this be done in pure Django of should I just do a Raw query?
cheers.
Best solution I found online https://gist.github.com/ryanpitts/1304725
'''
given a Model with:
category = models.CharField(max_length=32, choices=CATEGORY_CHOICES)
pubdate = models.DateTimeField(default=datetime.now)
<other fields>
Fetch the item from each category with the latest pubdate.
'''
model_max_set = Model.objects.values('category').annotate(max_pubdate=Max('pubdate')).order_by()
q_statement = Q()
for pair in model_max_set:
q_statement |= (Q(category__exact=pair['category']) & Q(pubdate=pair['max_pubdate']))
model_set = Model.objects.filter(q_statement)
Instead .order_by('-month')[:1] it's better to use .order_by('month').last() or .order_by('-month').first() (or earliest/latest for dates).
Of course when grouping you can use order_by:
last_record = Cashflow.objects \
.values('acct_id') \
.annotate(count=Count('id')) \
.order_by('month') \
.last()
Supose a student attandance system.
For a student and a course we have N:M relation named attandance.
Also whe have a model with attandances status (present, absent, justified, ...).
level( id, name, ... )
student ( id, name, ..., id_level )
course( id, name, ... )
status ( id, name, ...) #present, absemt, justified, ...
attandance( id, id_student, id_course, id_status, date, hour )
unique_together = ((id_student, id_course, id_status, date, hour),)
I'm looking for a list of students with >20% of absent for a level sorted by %. Something like:
present = status.objects.get( name = 'present')
justified = status.objects.get( name = 'justified')
absent = status.objects.get( name = 'absent')
#here the question. How to do this:
Student.objects.filter( level = level ).annotate(
nPresent =count( attandence where status is present or justified ),
nAbsent =count( attandence where status is absent ),
pct = nAbsent / (nAbsent + nPresent ),
).filter( pct__gte = 20 ).order_by( "-pct" )
If it is not possible to make it with query api, any workaround (lists, sets, dictionaris, ...) is wellcome!
thanks!
.
.
.
---- At this time I have a dirty raw sql writed by hand --------------------------
select
a.id_alumne,
coalesce ( count( p.id_control_assistencia ), 0 ) as p,
coalesce ( count( j.id_control_assistencia ), 0 ) as j,
coalesce ( count( f.id_control_assistencia ), 0 ) as f,
1.0 * coalesce ( count( f.id_control_assistencia ), 0 ) /
( coalesce ( count( p.id_control_assistencia ), 0 ) + coalesce ( count( f.id_control_assistencia ), 0 ) ) as tpc
from
alumne a
inner join
grup g
on (g.id_grup = a.id_grup )
inner join
curs c
on (c.id_curs = g.id_curs)
inner join
nivell n
on (n.id_nivell = c.id_nivell)
inner join
control_assistencia ca
on (ca.id_estat is not null and
ca.id_alumne = a.id_alumne )
inner join
impartir i
on ( i.id_impartir = ca.id_impartir )
left outer join
control_assistencia p
on (
p.id_estat in ( select id_estat from estat_control_assistencia where codi_estat in ('P','R' ) ) and
p.id_control_assistencia = ca.id_control_assistencia )
left outer join
control_assistencia j
on (
j.id_estat = ( select id_estat from estat_control_assistencia where codi_estat = 'J' ) and
j.id_control_assistencia = ca.id_control_assistencia )
left outer join
control_assistencia f
on (
f.id_estat = ( select id_estat from estat_control_assistencia where codi_estat = 'F' ) and
f.id_control_assistencia = ca.id_control_assistencia )
where
n.id_nivell = {0} and
i.dia_impartir >= '{1}' and
i.dia_impartir <= '{2}'
group by
a.id_alumne
having
1.0 * coalesce ( count( f.id_control_assistencia ), 0 ) /
( coalesce ( count( p.id_control_assistencia ), 0 ) + coalesce ( count( f.id_control_assistencia ), 0 ) )
> ( 1.0 * {3} / 100)
order by
1.0 * coalesce ( count( f.id_control_assistencia ), 0 ) /
( coalesce ( count( p.id_control_assistencia ), 0 ) + coalesce ( count( f.id_control_assistencia ), 0 ) )
desc
'''.format( nivell.pk, data_inici, data_fi, tpc )
If you don't care too much whether it uses the query api or python after the fact, use itertools.groupby.
attendances = Attendance.objects.select_related().filter(student__level__exact=level)
students = []
for s, g in groupby(attendances, key=lambda a: a.student.id):
g = list(g) # g is an iterator
present = len([a for a in g if a.status == 'present'])
absent = len([a for a in g if a.status == 'absent'])
justified = len([a for a in g if a.status == 'justified'])
total = len(g)
percent = int(absent / total)
students.append(dict(name=s.name, present=present, absent=absent, percent=percent))
students = (s for s in sorted(students, key=lambda x: x['percent']) if s['percent'] > 25)
You can pass the resulting list of dicts to the view the same way you would any other queryset.