Calculate the amount of the cost of tickets finalized per material divided by the total amount of the tickets finalized - powerbi

I have the following need :
Calculate the ratio between the sum of the amounts of tickets with status finalized for each material and the sum of the total amounts of the tickets finalized.
My fact table is like below :
TicketID StatusID MaterialID CategoryID Amount FKDATE
123 3 45 9 150 12/03/2021
124 5 50 4 569 11/03/2021
125 3 78 78 556 14/03/2021
126 -1 -1 -1 -1 12/03/2021
My dimension Status is like below :
StatusID Status
1 Open
2 In Process
3 Finalized
My dimension Material is like below :
MaterialID MaterielLabel
1 Bikes
.. ..
I want to exclude the TicketID with MaterialID = -1.

Try the following :
AmountFinalizedByMaterial:=
VAR AmountFinalizedByMaterialGroup =
CALCULATE (
SUM(yourFactTable[Amount]),
Status[Status] = "Finalized" ,
yourFactTable[MaterialID] <> -1)
VAR TotalAmountFinalized =
CALCULATE (
SUM(yourFactTable[Amount]),
Status[Status] = "Finalized" ,
ALL(Material)
)
RETURN
DIVIDE (
AmountFinalizedByMaterialGroup,
TotalAmountFinalized
)

Related

Single Measure in PowerBI - Divide filtered columns to produce percentage

I have a data set with cols (ID, Calc.CompleteBool where complete = 1 and incomplete = 0) of the form:
ID | Calc.CompleteBool
----------------------------
100| 1
101| 0
103| 1
105| 1
I need to create a measure that gives me a single percentage complete. Thus, the measure needs to count the total number of IDs (n) and divide by that number the total IDs that meet the condition of 'complete' or 1.
E.g. 3 / 4 = 75%
I have tried the following and it does not work. It is returning a value of zero (0). Your assistance is greatly appreciated.
Here is my code:
Calc.pctComplete =
VAR total_aps =
CALCULATE(
COUNT('TABLE_NAME'[ID]),
FILTER(
ALL('TABLE_NAME'),
'TABLE_NAME'[Calc.CompleteBool] = 'TABLE_NAME'[Calc.CompleteBool]
)
)
VAR total_aps_complete =
CALCULATE(
COUNT('TABLE_NAME'[Calc.CompleteBool]),
FILTER(
ALL('TABLE_NAME'),
'TABLE_NAME'[Calc.CompleteBool] = 1
)
)
RETURN total_aps_complete/total_aps
Update
I also need to add another filter in that only returns rows where "CheckID" = Yes.
There are 3,700 total IDs
There are ~ 1,500 IDs where CheckID = Yes
And roughly 8 where Calc.CompleteBool = 1
ID | Calc.CompleteBool | CheckID |
---------------------------------------
100| 1 | Yes
101| 0 | No
103| 1 | No
105| 1 | Yes
106| 0 | Yes
{100, 105, 106} are the set that would be included. So the division would be 2/3 = 66% complete.
Your result can be calculated with simple dax formula as following. The concept of calculate with filter can transform count into similar function like excel countifs:
Completion = CALCULATE(COUNT(Sheet1[ Calc.CompleteBool]),
Sheet1[ Calc.CompleteBool]=1, Sheet1[CheckID]="Yes") /
COUNT(Sheet1[ Calc.CompleteBool])
Output:
You may use this measure (add +0 to __completed if you want see 0% if all rows has 0 in Calc.CompleteBool either you get BLANK:
Percentage% =
var __completed = CALCULATE( COUNTROWS(VALUES(TABLE_NAME[ID])), 'TABLE_NAME'[Calc.CompleteBool] = 1) + 0
var __all = COUNTROWS('TABLE_NAME')
return
DIVIDE(__completed, __all)
Consider to use DIVIDE instead of "/" https://dax.guide/divide/

How can I write a query to carry a remaining balance of hours forward for load leveling a schedule?

I have a query result with a total amount of hours scheduled per week in chronological order without gaps and have a set amount of hours that can be processed each week. Any hours not processed should be carried over to one or more following weeks. The following information is available.
Week | Hours | Capacity
1 2000 160
2 100 160
3 0 140
4 150 160
5 500 160
6 1500 160
Each week it should reduce the new hours plus carried over hours by the Capacity but never go below zero. A positive value should carry into the following week(s).
Week | Hours | Capacity | LeftOver = (Hours + LAG(LeftOver) - Capacity)
1 400 160 240 (400 + 0 - 160)
2 100 160 180 (100 + 240 - 160)
3 0 140 40 ( 0 + 180 - 140)
4 20 160 0 ( 20 + 40 - 160) (no negative, change to zero)
5 500 160 340 (500 + 0 - 160)
6 0 160 180 ( 0 + 340 - 160)
I'm assuming this can be done with cte recursion and a running value that doesn't go below zero but I can't find any specific examples of how this would be written.
Well, you are not wrong, a recursive common table expression is indeed an option to construct a solution.
Construction of recursive queries can generally be done in steps. Run your query after every step and validate the result.
Define the "anchor" of your recursion: where does the recursion start?Here the start is defined by Week = 1.
Define a recursion iteration: what is the relation between iterations?Here that would be the incrementing week numbers d.Week = r.Week + 1.
Avoiding negative numbers can be resolved with a case expression.
Sample data
create table data
(
Week int,
Hours int,
Capacity int
);
insert into data (Week, Hours, Capacity) values
(1, 400, 160),
(2, 100, 160),
(3, 0, 140),
(4, 20, 160),
(5, 500, 160),
(6, 0, 160);
Solution
with rcte as
(
select d.Week,
d.Hours,
d.Capacity,
case
when d.Hours - d.Capacity > 0
then d.Hours - d.Capacity
else 0
end as LeftOver
from data d
where d.Week = 1
union all
select d.Week,
d.Hours,
d.Capacity,
case
when d.Hours + r.LeftOver - d.Capacity > 0
then d.Hours + r.LeftOver - d.Capacity
else 0
end
from rcte r
join data d
on d.Week = r.Week + 1
)
select r.Week,
r.Hours,
r.Capacity,
r.LeftOver
from rcte r
order by r.Week;
Result
Week Hours Capacity LeftOver
---- ----- -------- --------
1 400 160 240
2 100 160 180
3 0 140 40
4 20 160 0
5 500 160 340
6 0 160 180
Fiddle to see things in action.
I ended up writing a few CTEs then a recursive CTE and got what I needed. The capacity is a static number here but will be replaced later with one that takes holidays and vacations into account. Will also need to consider the initial 'LeftOver' value for the first week but could use this query with an earlier date period to find the most recent date with a zero LeftOver value then use that as a new start date, then filter out those earlier weeks in the final query.
DECLARE #StartDate date = (SELECT MAX(FirstDayOfWorkWeek) FROM dbo._Calendar WHERE Date <= GETDATE());
DECLARE #EndDate date = DATEADD(week, 12, #StartDate);
DECLARE #EmployeeQty int = (SELECT ISNULL(COUNT(*), 0) FROM Employee WHERE DefaultDepartment IN (4) AND Hidden = 0 AND DateTerminated IS NULL);
WITH hours AS (
/* GRAB ALL NEW HOURS SCHEDULED FOR EACH WEEK IN THE SELECTED PERIOD */
SELECT c.FirstDayOfWorkWeek as [Date]
, SUM(budget.Hours) as hours
FROM dbo.Project_Phase phase
JOIN dbo.Project_Budget_Labor budget on phase.ID = budget.Phase
JOIN dbo._Calendar c on CONVERT(date, phase.Date1) = c.[Date]
WHERE phase.CompletedOn IS NULL AND phase.Project <> 4266
AND phase.Date1 BETWEEN #StartDate AND #EndDate
AND budget.Department IN (4)
GROUP BY c.FirstDayOfWorkWeek
)
, weeks AS (
/* CREATE BLANK ROWS FOR EACH WEEK AND JOIN TO ACTUAL HOURS TO ELIMINATE GAPS */
/* ADD A ROW NUMBER FOR RECURSION IN NEXT CTE */
SELECT cal.[Date]
, ROW_NUMBER() OVER(ORDER BY cal.[Date]) as [rownum]
, ISNULL(SUM(hours.Hours), 0) as Hours
FROM (SELECT FirstDayOfWorkWeek as [Date] FROM dbo._Calendar WHERE [Date] BETWEEN #StartDate AND #EndDate GROUP BY FirstDayOfWorkWeek) as cal
LEFT JOIN hours on cal.[Date] = hours.[Date]
GROUP BY cal.[Date]
)
, spread AS (
/* GRAB FIRST WEEK AND USE RECURSION TO CREATE RUNNING TOTAL THAT DOES NOT DROP BELOW ZERO*/
SELECT TOP 1 [Date]
, rownum
, Hours
, #EmployeeQty * 40 as Capacity
, CONVERT(numeric(9,2), 0.00) as LeftOver
, Hours as running
FROM weeks
ORDER BY rownum
UNION ALL
SELECT curr.[Date]
, curr.rownum
, curr.Hours
, #EmployeeQty * 40 as Capacity
, CONVERT(numeric(9,2), CASE WHEN curr.Hours + prev.LeftOver - (#EmployeeQty * 40) < 0 THEN 0 ELSE curr.Hours + prev.LeftOver - (#EmployeeQty * 40) END) as LeftOver
, curr.Hours + prev.LeftOver as running
FROM weeks curr
JOIN spread prev on curr.rownum = (prev.rownum + 1)
)
SELECT spread.Hours as NewHours
, spread.LeftOver as PrevHours
, spread.Capacity
, spread.running as RunningTotal
, CASE WHEN running < Capacity THEN running ELSE Capacity END as HoursThisWeek
FROM spread

load multiple csv files into Dataframe: columns names issue

I have multiple csv files with the same format (14 rows 4 columns).
I tried to load all of them into a single dataFrame, and use file's name to rename the values of the first column (1-14)
1 500 0 0
2 350 0 1
3 500 1 0
.............
13 600 0 0
14 800 0 0
I tried the following code but I am not getting what I am expecting:
filenames = os.listdir('Threshold/')
Y = pd.DataFrame () #empty df
# file name are in the following foramt "subx_ICA_thre.csv"
# need to get x (subject number to be used later for renaming columns values)
Sub_list=[]
for filename in filenames:
s= int(''.join(filter(str.isdigit, filename)))
Sub_list.append(int(s))
S_Sub_list= sorted(Sub_list)
for x in S_Sub_list: # get the file according to the subject number
temp = pd.read_csv('sub' +str(x)+'_ICA_thre.csv' )
df = pd.concat([Y, temp]) # concat the obtained frame with the empty frame
df.columns = ['id', 'data', 'isEB', 'isEM']
# replace the column values using subject id
for sub in range(1,15):
df['id'].replace(sub, 'sub' +str(x)+'_ICA_'+str(sub) ,inplace=True)
print (df)
output:
id data isEB isEM
0 sub1_ICA_2 200 0 0
1 sub1_ICA_3 275 0 0
2 sub1_ICA_4 500 1 0
................................
11 sub1_ICA_13 275 0 0
12 sub1_ICA_14 300 0 0
id data isEB isEM
0 sub2_ICA_2 275 0 0
1 sub2_ICA_3 500 0 0
2 sub2_ICA_4 400 0 0
.................................
11 sub2_ICA_13 300 0 0
12 sub2_ICA_14 450 0 0
First, it seems that the code makes different dataFrame not a single one.Second, the first row is removed (sub1_ICA_1 is missing, may be replaced with column names).
I couldn't find the problem in the loop that I am using
I think you need create list of DataFrames first, then concat with parameter keys for new values by range in MultiIndex, then modify column id and last remove MultiIndex by reset_index:
Also was added parameter names to read_csv for custom columns names.
Y = []
for x in S_Sub_list:
n = ['id', 'data', 'isEB', 'isEM']
temp = pd.read_csv('sub' + str(x) +'_ICA_thre.csv', names = n)
Y.append(temp)
#list comprehension alternative
#n = ['id', 'data', 'isEB', 'isEM']
#Y = [pd.read_csv('sub' + str(x) +'_ICA_thre.csv', names = n) for x in S_Sub_list]
df = pd.concat(Y, keys=range(1,len(S_Sub_list) + 1))
df['id'] = 'sub' + df.index.get_level_values(0).astype(str) +'_ICA_'+ df['id'].astype(str)
df = df.reset_index(drop=True)

percentage bins based on predefined buckets

I have a series of numbers and I would like to know % of numbers falling in every bucket of a dataframe.
df['cuts'] have 10, 20 and 50 as values. Specifically, I would like to what % of series are in [0-10], (10-20] and (20-50] bin and this should be appended to the df dataframe.
I wrote the following code. I definitely feel that it could be improvised. Any help is appreciated.
bin_cuts = [-1] + list(df['cuts'].values)
out = pd.cut(series, bins = bin_cuts)
df_pct_bins = pd.value_counts(out, normalize= True).reset_index()
df_pct_bins = pd.concat([df_pct_bins['index'].str.split(', ', expand = True), df_pct_bins['cuts']], axis = 1)
df_pct_bins[1] = df_pct_bins[1].str[:-1].astype(str)
df['cuts'] = df['cuts'].astype(str)
df_pct_bins = pd.merge(df, df_pct_bins, left_on= 'cuts', right_on= 1)
Consider the sample data df and s
df = pd.DataFrame(dict(cuts=[10, 20, 50]))
s = pd.Series(np.random.randint(50, size=1000))
Option 1
np.searchsorted
c = df.cuts.values
df.assign(
pct=df.cuts.map(
pd.value_counts(
c[np.searchsorted(c, s)],
normalize=True
)))
cuts pct
0 10 0.216
1 20 0.206
2 50 0.578
Option 2
pd.cut
c = df.cuts.values
df.assign(
pct=df.cuts.map(
pd.cut(
s,
np.append(-np.inf, c),
labels=c
).value_counts(normalize=True)
))
cuts pct
0 10 0.216
1 20 0.206
2 50 0.578

Pandas groupby mean absolute deviation

I have a pandas dataframe like this:
Product Group Product ID Units Sold Revenue Rev/Unit
A 451 8 $16 $2
A 987 15 $40 $2.67
A 311 2 $5 $2.50
B 642 6 $18 $3.00
B 251 4 $28 $7.00
I want to transform it to look like this:
Product Group Units Sold Revenue Rev/Unit Mean Abs Deviation
A 25 $61 $2.44 $0.24
B 10 $46 $4.60 $2.00
The Mean Abs Deviation column is to be performed on the Rev/Unit column in the first table. The tricky thing is taking into account the respective weights behind the Rev/Unit calculation.
For example taking a straight MAD of Product Group A's Rev/Unit would yield $0.26. However after taking weight into consideration, the MAD would be $0.24.
I know to use groupby to get the simple summation for units sold and revenue, but I'm a bit lost on how to do the more complicated calculations of the next 2 columns.
Also while we're giving advice/help---is there any easier way to create/paste tables into SO posts??
UPDATE:
Would a solution like this work? I know it will for the summation fields, but not sure how to implement for the latter 2 fields.
grouped_df=df.groupby("Product Group")
grouped_df.agg({
'Units Sold':'sum',
'Revenue':'sum',
'Rev/Unit':'Revenue'/'Units Sold',
'MAD':some_function})
you need to clarify what the "weights" are, I assumed the weights are the number of units sold, but that gives a different results from yours:
pv = df.pivot_table( rows='Product Group',
values=[ 'Units Sold', 'Revenue' ],
aggfunc=sum )
pv[ 'Rev/Unit' ] = pv.Revenue / pv[ 'Units Sold' ]
this gives:
Revenue Units Sold Rev/Unit
Product Group
A 61 25 2.44
B 46 10 4.60
As for WMAD:
def wmad( prod ):
idx = df[ 'Product Group' ] == prod
w = df[ 'Units Sold' ][ idx ]
abs_dev = np.abs ( df[ 'Rev/Unit' ][ idx ] - pv[ 'Rev/Unit' ][ prod ] )
return sum( abs_dev * w ) / sum( w )
pv[ 'Mean Abs Deviation' ] = [ wmad( idx ) for idx in pv.index ]
which as I mentioned gives different result
Revenue Units Sold Rev/Unit Mean Abs Deviation
Product Group
A 61 25 2.44 0.2836
B 46 10 4.60 1.9200
From your suggested solution, you can use a lambda function to operate on each row e.g:
'Rev/Unit': lambda x: calculate_revenue_per_unit(x)
Bear in mind that x is a tuple for each row, so you'll need to unpack that within your calculate_revenue_per_unit function.