conditional group by with query api - django

Supose a student attandance system.
For a student and a course we have N:M relation named attandance.
Also whe have a model with attandances status (present, absent, justified, ...).
level( id, name, ... )
student ( id, name, ..., id_level )
course( id, name, ... )
status ( id, name, ...) #present, absemt, justified, ...
attandance( id, id_student, id_course, id_status, date, hour )
unique_together = ((id_student, id_course, id_status, date, hour),)
I'm looking for a list of students with >20% of absent for a level sorted by %. Something like:
present = status.objects.get( name = 'present')
justified = status.objects.get( name = 'justified')
absent = status.objects.get( name = 'absent')
#here the question. How to do this:
Student.objects.filter( level = level ).annotate(
nPresent =count( attandence where status is present or justified ),
nAbsent =count( attandence where status is absent ),
pct = nAbsent / (nAbsent + nPresent ),
).filter( pct__gte = 20 ).order_by( "-pct" )
If it is not possible to make it with query api, any workaround (lists, sets, dictionaris, ...) is wellcome!
thanks!
.
.
.
---- At this time I have a dirty raw sql writed by hand --------------------------
select
a.id_alumne,
coalesce ( count( p.id_control_assistencia ), 0 ) as p,
coalesce ( count( j.id_control_assistencia ), 0 ) as j,
coalesce ( count( f.id_control_assistencia ), 0 ) as f,
1.0 * coalesce ( count( f.id_control_assistencia ), 0 ) /
( coalesce ( count( p.id_control_assistencia ), 0 ) + coalesce ( count( f.id_control_assistencia ), 0 ) ) as tpc
from
alumne a
inner join
grup g
on (g.id_grup = a.id_grup )
inner join
curs c
on (c.id_curs = g.id_curs)
inner join
nivell n
on (n.id_nivell = c.id_nivell)
inner join
control_assistencia ca
on (ca.id_estat is not null and
ca.id_alumne = a.id_alumne )
inner join
impartir i
on ( i.id_impartir = ca.id_impartir )
left outer join
control_assistencia p
on (
p.id_estat in ( select id_estat from estat_control_assistencia where codi_estat in ('P','R' ) ) and
p.id_control_assistencia = ca.id_control_assistencia )
left outer join
control_assistencia j
on (
j.id_estat = ( select id_estat from estat_control_assistencia where codi_estat = 'J' ) and
j.id_control_assistencia = ca.id_control_assistencia )
left outer join
control_assistencia f
on (
f.id_estat = ( select id_estat from estat_control_assistencia where codi_estat = 'F' ) and
f.id_control_assistencia = ca.id_control_assistencia )
where
n.id_nivell = {0} and
i.dia_impartir >= '{1}' and
i.dia_impartir <= '{2}'
group by
a.id_alumne
having
1.0 * coalesce ( count( f.id_control_assistencia ), 0 ) /
( coalesce ( count( p.id_control_assistencia ), 0 ) + coalesce ( count( f.id_control_assistencia ), 0 ) )
> ( 1.0 * {3} / 100)
order by
1.0 * coalesce ( count( f.id_control_assistencia ), 0 ) /
( coalesce ( count( p.id_control_assistencia ), 0 ) + coalesce ( count( f.id_control_assistencia ), 0 ) )
desc
'''.format( nivell.pk, data_inici, data_fi, tpc )

If you don't care too much whether it uses the query api or python after the fact, use itertools.groupby.
attendances = Attendance.objects.select_related().filter(student__level__exact=level)
students = []
for s, g in groupby(attendances, key=lambda a: a.student.id):
g = list(g) # g is an iterator
present = len([a for a in g if a.status == 'present'])
absent = len([a for a in g if a.status == 'absent'])
justified = len([a for a in g if a.status == 'justified'])
total = len(g)
percent = int(absent / total)
students.append(dict(name=s.name, present=present, absent=absent, percent=percent))
students = (s for s in sorted(students, key=lambda x: x['percent']) if s['percent'] > 25)
You can pass the resulting list of dicts to the view the same way you would any other queryset.

Related

PowerBI percentage by count of another column

How do I calculate the percentage of values in column A from the count of column B?
Or the percentage of non-blank values from count of all values (blank and not)?
I have a column with unique IDs and a 4 columns with values, and I want to see the 4 columns together in a bar chart which I can do with count.
ID
A
B
C
D
12345
1
3
12346
1
2
3
4
12347
3
4
12348
3
4
With count it would show as A=2, B=1, C=4, D=3.
In the percentage I'm looking for it would show as A=50%, B=25%, C=100%, D=75%.
Thanks!
Maya
Achievable with following measures
_a =
DIVIDE (
CALCULATE ( COUNT ( tbl[A] ), ALL ( tbl[ID] ) ),
CALCULATE ( COUNT ( tbl[ID] ), ALL ( tbl[ID] ) )
)
_b =
DIVIDE (
CALCULATE ( COUNT ( tbl[B] ), ALL ( tbl[ID] ) ),
CALCULATE ( COUNT ( tbl[ID] ), ALL ( tbl[ID] ) )
)
_c =
DIVIDE (
CALCULATE ( COUNT ( tbl[C] ), ALL ( tbl[ID] ) ),
CALCULATE ( COUNT ( tbl[ID] ), ALL ( tbl[ID] ) )
)
_d =
DIVIDE (
CALCULATE ( COUNT ( tbl[D] ), ALL ( tbl[ID] ) ),
CALCULATE ( COUNT ( tbl[ID] ), ALL ( tbl[ID] ) )
)
Try this approach:
Create 4 formulas
A% = CONCATENATE(FORMAT((CALCULATE(COUNT(Sheet5[A]),ALLNOBLANKROW(Sheet5[A]))/COUNTROWS(ALL(Sheet5)))*100,"") , "%")
B% = CONCATENATE(FORMAT((CALCULATE(COUNT(Sheet5[B]),ALLNOBLANKROW(Sheet5[B]))/COUNTROWS(ALL(Sheet5)))*100,"") , "%")
C% = CONCATENATE(FORMAT((CALCULATE(COUNT(Sheet5[C]),ALLNOBLANKROW(Sheet5[C]))/COUNTROWS(ALL(Sheet5)))*100,"") , "%")
D% = CONCATENATE(FORMAT((CALCULATE(COUNT(Sheet5[D]),ALLNOBLANKROW(Sheet5[D]))/COUNTROWS(ALL(Sheet5)))*100,"") , "%")
End result is:

Countrows/calculate/sum rows in DAX

I am fairly new with Power BI and DAX and I'm stuck. I'll try to explain the current situation and what I want to become my output. I've tried a lot of meaures with distinctcount, calculate, you name it, I did it. But can't find the right solution.
We've got 4 columns: Date, Employee_ID, Sick, %FTE. Every row records if an employee was sick on that date. Blank is not sick and Y = sick.
I would like to create a measure where it counts the %FTE just once when an employee is sick in a particular week, month or year.
So the output of January should be 2,13 (0,8 + 0,33 + 1) and in February 1,8 (0,8 + 1).
enter image description here
You would need two additional columns in the dataset as following
Once you have that, you can use the following measures to reach the goal
Measure8 =
VAR _1 =
IF (
MAX ( 'fact'[sick] ) <> BLANK (),
RANKX (
FILTER (
ALL ( 'fact' ),
'fact'[emp_id] = MAX ( 'fact'[emp_id] )
&& 'fact'[Year] = MAX ( 'fact'[Year] )
&& 'fact'[Month] = MAX ( 'fact'[Month] )
&& 'fact'[sick] = "Y"
),
CALCULATE ( MAX ( 'fact'[date] ) ),
,
ASC,
DENSE
)
)
VAR _2 =
IF ( _1 = 1, IF ( MAX ( 'fact'[sick] ) = "y", MAX ( 'fact'[%FTE] ) ) )
RETURN
_2
Measure9 =
IF (
HASONEVALUE ( 'date'[date] ),
[Measure8],
VAR _1 =
MAXX (
GROUPBY (
ADDCOLUMNS ( 'fact', "val", [Measure8] ),
[Year],
[Month],
"total", SUMX ( CURRENTGROUP (), [val] )
),
[total]
)
VAR _2 =
MAXX (
GROUPBY (
ADDCOLUMNS ( 'fact', "val", [Measure8] ),
[Year],
"total", SUMX ( CURRENTGROUP (), [val] )
),
[total]
)
VAR _3 =
IF ( ISINSCOPE ( 'fact'[Year] ), _2, _1 )
RETURN
_3
)
Also, for any future posts please provide the sample data and expected output as markdown tables How To

If it is the first date or last date use one aggregation; if any other day, use another aggregation

Let's assume I have the below dataset.
What I need to create the below matrix where if it is the beginning or month end, I aggregate A or B in Category 1 and calculate SUM but if it is any other day in a month but 1st or last, I am tagging A or B in Category 2 and calculate SUM. I guess I need to use SWITCH, don't I?
Edit in info from comments
Like to create 3 col:
isStart = IF ( main_table[date] = STARTOFMONTH ( main_table[date] ), 1, 0 )
isEnd = IF ( main_table[date] = ENDOFMONTH ( 'main_table'[date] ), 1, 0 )
in_between_date =
IF ( AND ( main_table[date] <> ENDOFMONTH ( 'main_table'[date] ),
main_table[date] <> STARTOFMONTH ( main_table[date] ) ), 1, 0 )
Then, create the columns with my categories, like
start_end =
IF ( OR ( NOT ( ISERROR ( SEARCH ( "A", main_table[code] ) ) ),
main_table[code] = "B" ),
"Category 1",
BLANK () )
and
in_between =
IF ( OR ( main_table[code] = "B", main_table[code] = "A" ), "Category 2", BLANK () )
But then, what should I use in switch/if ? = if(VALUES('main_table'[isStart]) = 1, then what?
You where on the right track but overcomplicated a bit. You only need one extra column "Category" giving for each row in what category the item falls.
Category =
IF (
startEnd[date] = STARTOFMONTH ( startEnd[date] )
|| startEnd[date] = ENDOFMONTH ( startEnd[date] );
"Category1";
"Category2"
)
table end result is:

Trying to Calculate daily percentage based on Filter and ALLEXCEPT

I have the below question which I asked earlier but along with that I want to filter further along with other columns apart from month and year I want to add Resource Name,RecordType
How to calculate daily percentage over month on month volume?
Below I tried to add allexcept which is not working
Total_Percentage =
VAR TotalPerMonth =
CALCULATE (
SUM ( data1[Actual] ),
FILTER ( data1, data1[Month].[Month] = EARLIER ( data1[Month].[Month] ) ),
FILTER ( data1, data1[Month].[Year] = EARLIER ( data1[Month].[Year] ) ),
ALLEXCEPT(data1,data[RecordType],data1[Resource Name]),
FILTER ( data1, data1[Flag] = 1 )
)
RETURN
DIVIDE ( data1[actual], TotalPerMonth, 0 )
This might be a bit more optimized:
Total_Percentage =
VAR TotalPerMonth =
CALCULATE (
SUM ( data1[Actual] ),
FILTER (
ALLEXCEPT ( data1, data[RecordType], data1[Resource Name] ),
data1[Month].[Month] = EARLIER ( data1[Month].[Month] ) &&
data1[Month].[Year] = EARLIER ( data1[Month].[Year] ) &&
data1[Flag] = 1
)
)
RETURN
DIVIDE ( data1[actual], TotalPerMonth, 0 )
I think this should work for me. If you have any optimized please let me know
Total_Percentage =
VAR TotalPerMonth =
CALCULATE (
SUM ( data1[Actual] ),
FILTER ( data1, data1[Month].[Month] = EARLIER ( data1[Month].[Month] ) ),
FILTER ( data1, data1[Month].[Year] = EARLIER ( data1[Month].[Year] ) ),
FILTER(ALL('data1'),[Resource Name]=EARLIER('data1'[Resource Name])),
FILTER(ALL('data1'),[RecordType]=EARLIER('data1'[RecordType])),
FILTER ( data1, data1[Flag] = 1 )
)
RETURN
DIVIDE ( data1[actual], TotalPerMonth, 0 )

Excel IF G1=A2 (date) and G2=B2 (type) and G3=C2 (item no) then return the price of D2 in G4 but if D2 is blank ret the price from next prev date D3

I need to write an statement IF VALUE 1 equals A2 and VALUE 2 equals B2 and VALUE 3 equals C2 then return the price of D2 but if D2 is blank return the price from the previous date of A2. So the example below If VALUE 1 is 6/30/2012 and VALUE 2 is Sweater and VALUE 3 is SW123456 the results will be 19.00 (from 3/31/2012) since 6/30/2012 is blank.
TABLE example:
A B C D
DATE TYPE ITEM NO PRICE
6/30/2012 Sweater SW123456
3/31/2012 Sweater SW123456 19.00
VALUE 1: 6/30/2012
VALUE 2: Sweater
VALUE 3: SW123456
RESULTS:
I've moved your inputs (VALUE 1 etc) and RESULT to G1:G4, so we can use the whole range A:D for the data.
First, you want to look up the latest date before or on the date you're looking at that has a match for all other criteria and has a price that's not empty. You can do this by using the formula:
= MAX( INDEX( A:A, MATCH( 1, (A:A <= G1 ) * ( B:B = G2 ) * ( C:C = G3 ) * ( NOT( ISBLANK( D:D ) ) ), 0 ) ) )
This is an array formula, so you should confirm by ctrl+shift+Enter rather than just Enter. In your example, this should give you 3/31/12. For the sake of argument, let's call this "myDate".
Secondly, you need to find the price which matches the VALUE 2 and VALUE 3, as well as the date you've just found. This can be done as follows:
= INDEX( D:D, MATCH( 1, ( A:A = myDate ) * ( B:B = G2 ) * ( C:C = G3 ), 0 ) )
This is again an array formula. Now, all we need to do is replace "myDate" with the first function, which gives us:
= INDEX( D:D, MATCH( 1, ( A:A = MAX( INDEX( A:A, MATCH( 1, ( A:A <= G1 ) * ( B:B = G2 ) * ( C:C = G3 ) * ( NOT( ISBLANK( D:D ) ) ), 0 ) ) ) ) * ( B:B = G2 ) * ( C:C = G3 ), 0 ) )
Again, this is an array formula, so confirm with ctrl+shift+Enter.
Try this:
=INDEX($D:$D,MATCH(MAX(IF($A:$A<G1,IF($B:$B=G2,IF($C:$C=G3,IF($D:$D<>"",$A:$A))))),IF($A:$A<G1,IF($B:$B=G2,IF($C:$C=G3,IF($D:$D<>"",$A:$A)))),0))
This is an array formula and needs to be confirmed with Ctrl-Shift-Enter.
I set up my sheet like this, so you can understand the references: