Need help in Data Analytics Calculations.
Currently, I am getting historical data for consumption as follows:
on above data, I am adding custom columns for calculating exact consumption(gallons) in no. of days. like:
Now, I have to plot month wise bar chart for consumption of respective Meter ID in 2016 year. But problem here is, I will have to calculate Every months consumption by dividing it in days in each respective month of 2016, and then only I will able to plot them monthly like:
y axis = consumption in every month
x axis = Jan Feb March Apr May Jun Jul Aug Sep Oct Nov Dec
so, in jan month, consumption should be = 10 + 100 + ((115/38) * 7) gallons
Notes: here, in ((115/38) * 7) : we are calculating avg consumption of single day 7 days in Jan and whole march and then getting last 7 day consumption of Jan so that we can add it in calculation of total consumption of Jan month
but how to add measure/custom column/new table for these calcualtions?
Thanks
What you need to do is relatively complicated, but the summary of my solution is:
Calculate the per-day consumption
Calculate the start and end date of each reading (e.g. the previous reading date plus one day, and the reading date)
Expand your data to have 1 row per day rather than 1-row per reading
You want to do these steps before you load the data into your data model (i.e. in your source system, or as the data is loaded using the Query Editor/Power Query).
Below, I assume you're using the Query Editor/Power Query. However, if you can use your source system, it's often the better choice (since the source system may be a database that is vastly faster than your desktop).
Note that your No. of Days calculation doesn't make sense to me. There are more than 38 days between 24 Jan 2016 and 31 Mar 2016. There are also more than 13 days between 10 Jan and 24 Jan. For this reason, it was difficult to tell whether you wanted a new reading to count on the day the previous reading was taken, or on the next full day. I assume the former. Also note, I've proceeded on the basis that your No. of Days calculation is correct
Calculate the Per Day Consumption
This is the easiest step, given that you have already calculated the Consumption and the No. of Days. Just divide one by the other. In the Query Editor, you can click in the Consumption (gallons) column and select Add Column > Standard > Divide. Under Value, choose Use values in a column and then select the No. of Days column.
Calculate the Start & End Date of Each Reading
The date of the reading is the end date, so you can rename Date to be End Date (since a reading is applied retroactively).
For the start date, in the Query Editor, you will need to add an index column (Add Column > Index Column). You will want to make sure your data is sorted by Meter ID and Date Ascending before doing this. Call the column Index.
Next, Add Column > Custom Column and pull the reading date from the prior row. Call the new column Previous End Date for now.
// A try is necessary because we can't get the previous row if there is no previous row (we'll get an error, which we can handle in the 'otherwise' block)
try
if
// See if the previous row is for the same Meter ID
[Meter ID] = #"Reordered Columns"{[Index] - 1}[Meter ID]
then
// If it is, grab the Reading Date from the previous row
#"Reordered Columns"{[Index]-1}[End Date]
else
// If this is the first reading for a meter, calculate the Start Date by subtracting the No. of Days from the End Date
Date.AddDays([End Date], -[No. of Days])
otherwise
// If this is the first row in the table, also calculate the Start Date by subtracting the No. of Days from the End Date
Date.AddDays([End Date], -[No. of Days])
Next, you'll want to add 1 to the Start Date, as we want the reading to apply to the day after the previous reading, not on the day of the previous reading.
Note, if you want the reading date to count in the prior period, subtract 1 from the End Date rather than add 1 to the start date (previous end date).
Expand your data to have 1 row per day
At this point, you should have a Meter ID, Start Date, End Date, and per day consumption column that reflects what you expect (i.e. the per day consumption is correct for the date range).
The final step is to duplicate each row for each date in the date range. There are several solutions to this outlined in this thread (https://community.powerbi.com/t5/Desktop/Convert-date-ranges-into-list-of-dates/td-p/129418), but personally, I recommend the technique (and video) posted by MarcelBeug (https://youtu.be/QSXzhb-EwHM).
You should end up with something more like this (after some removing & renaming of columns):
Finally
Now that you have one row per meter & date, with a per day consumption already calculated, you can build a visual. For example, you could do a column chart with Date on the Axis, and Consumption per Day as the value. By default, Power BI will recognize that Date is a date, and will roll it up by Year-Quarter-Month-Day. Press the little 'x' by Year and Quarter, and you'll have a chart that sums up the per day consumption by month. You can also drill down to individual date.
Further Reading
Reading a value from a previous row in Power Query
If Statements in Power Query
The AddDays function in Power Query
Adding Comments in Power Query
Catching Errors in Power Query
Converting a date range into a list of dates (Marcel Beug's solution)
A similar problem I previously answered
Related
I have a process where I have relationship between my sales table and my custom date table. My sales data gets its' filter from the custom date table. I have a process where I have 3 refreshes in 3 consecutive days starting from EOM of "previous month" and show last 12 calendar months.
I start this process end of every month i.e. For February data report I start on 02/28/2023 that will be my first refresh to filter sales data for last 12 calendar months (03/01/2022-02/28/2023). The next following day (03/01/2023) the second refresh need to filter (03/01/2022-02/28/2023).Then the final refresh is the next following day (03/02/2023) also need to filter (03/01/2022-02/28/2023). I couldn't use the standard filters in PBI since the eom is an exception. I have tried to create custom table using a formula where I add 7 days to today(02/28/2023) so I can go forward in time for my first day(02/28) and come back a month so I can get whole 12 calendar months. This wouldn't matter for the next month(03) since 7 days is not going to make any difference for month filter. I tried below custom table formula however it didn't work. Perhaps there is a better way to handle the exception indicated below
Table = CALENDAR(DATE(YEAR(TODAY())-1,MONTH(TODAY()),DAY(TODAY()+7)),TODAY())
My backend table has the data at a week level. It contain the current ISO year and current ISO week, as well as, the previous year's ISO year and week number that the current year's data should be compared with.
For each signup_iso_year-signup_iso_week combination, there exists only one iso_prev_year-iso_prev_yearweek combination.
The iso_prev_year, iso_prev_yearweek columns account for the offset that might occur due to certain years having 53 weeks instead of 52.
data table
(I can't embed images, so I have added a table here as well, although it has much less information than the image in 'data table').
Number_of_signups
signup_iso_year
signup_iso_week
iso_prev_year
iso_prev_yearweek
Country
grade_level
5
2020
18
2019
18
IN
middle school
7
2020
18
2019
18
US
high school
6
2021
17
2018
18
IN
middle school
8
2021
17
2018
18
US
high school
I want to calculate to Year-Over_Year Change in number_of_signups using the signup_iso_year, signup_iso_week, iso_prev_year, iso_prev_yearweek columns.
I have already tried to create a calculated column that contains the sum of number_of_signups from previous year, but since every combination of country, grade_level, subject, email_type might not exist in previous or current year, some of the values are getting lost and hence giving incorrect results.
The answer I am looking for, is a Power BI measure that can give me the YOY change based on signup_iso_year and signup_iso_week.
Edit: I should have mentioned this before, but I forgot. The table contains data from 2018 to current day. So, the data size is quite large. Also, I need this YoY measure for a time series visual, which means that I can't assign ISOyear/ISOweek values for previous year using simple MAX/MIN functions. It needs to pick values from the iso_prev_year, iso_prev_yearweek columns but since EARLIER function can't be used in a measure, I am not able to figure out how to do that.
Which is why I had tried to create a calculated column, and use the EARLIER function to compute previous year's number_of_signups. But because of the other columns present in the data, i.e., country, grade_level, subject, email_type, there were discrepancies occurring in the actual number_of_signups and the calculated previous_year_number_of_signups. These discrepancies were due to the fact that not every combination of these columns exists for each week, so we might miss out on some data when calculating previous_year_number_of_signups.
Edit 2: Was asked to include examples of what the expected result would look like, so adding some pictures.
YoY at overall level, country level, grade level
YOY at country+grade level
If I understand your requirement correct, you need a Measure like below. Remember, this may not the exact one you need, but this will definitely help you to reach your required output.
prev_signup =
var iso_prev_year = MIN(your_table_name[iso_prev_year])
var iso_prev_year_week = MIN(your_table_name[iso_prev_yearweek])
RETURN
CALCULATE(
SUM(your_table_name[Number_of_signups]),
FILTER(
ALL(your_table_name),
your_table_name[signup_iso_year] = iso_prev_year
&& your_table_name[signup_iso_week] = iso_prev_year_week
)
) + 0
You can also do some transformation in Power Query Editor and join the same table using your Key columns. That case, you can bring previous year's value in the same row. Rest is just compare 2 columns from your table to calculate YOY
I just want to select a range of days in a slicer and show in a table the number of days for each month/period (month-year).
I used DAX to create a table with the information I need and I don't have problems with the periods (first column), it changes dinamically, the problem is the column "Days" (second column) because it's always showing the total number of days for each month.
Here my DAX code
SelectedPeriods = GROUPBY(DimDate;DimDate[Period];"Days";COUNTX(CURRENTGROUP();DimDate[DateKey]))
Here the result
What I expect is:
2 for april, 31 for may, 1 for june
This is an issue with execution order.
SelectedPeriods = GROUPBY(DimDate;DimDate[Period];"Days";COUNTX(CURRENTGROUP();DimDate[DateKey]))
Generates a calculated table. These are calculated when the data model is refreshed and stored in it. They are not refreshed each time a connected dimension is changed within a dashboard.
In your case, while changing date filters may hide rows from this table the number of days remains fixed at the number calculated initially when there was no filter context on the data i.e. counting all days in the month.
If you want the result to change then you need to use a measure instead of a calculated table. Measures react to the current filter context within the report and so will adjust their output each time a slicer is changed.
The needed measure will depend on your model but might be something as simple as:
CountOfDays := CountRows(DimDate)
I am attempting to calculate the most recent 6-Month STDEVX.P (not including the current month; so in May 2017, I'd like to the STDEVX.P for periods Nov 2016 - Apr 2017) for sales by product in order to further calculate variation in sales orders.
The Sales Data is made up of daily transactions so it contains transaction date: iContractsChargebacks[TransactionDate] and units sold: iContractsChargebacks[ChargebackUnits], but if there are no sales in a given period, then there will be no data for that month.
So, for example, on July 1st, sales for the past 6 months were the following:
Jan 100
Feb 125
Apr 140
May 125
Jun 130
March is missing because there were no sales. So, when I calculate STDEVX.P on the data set, it is calculating it over 5 periods, when in fact there were 6, just one happens to be zero.
At the end of the day, I need to calculate STDEVX.P for the current six month period. If when pulling the monthly sales numbers, it only comes back with 3 periods(months), then it needs to assume the other 3 periods with a zero value.
I thought about manually calculating standard deviation instead of using the DAX STDEVX.P formula and found these 2 links as a reference on how to do so, the first being closest to my need:
https://community.powerbi.com/t5/Desktop/Problem-with-STDEV/td-p/19731
Calculating the standard deviation from columns of values and frequencies in Power BI...
I attempted to make a go of it, but still am not getting the correct calculation. My code is:
STDEVX2 =
var Averageprice=[6M Sales]
var months=6
return
SQRT(
DIVIDE(SUMX(
FILTER(ALL(DimDate),
DimDate[Month ID]<=(MAX(DimDate[Month ID])-1) &&
DimDate[Month ID]>=(MAX(DimDate[Month ID])-6)
),
(iContractsChargebacks[SumOfOrderQuantity]-Averageprice)^2),
months
)
)
*note: Instead of using date parameters in the code, I created a calculated column in the date table that gives each Month a unique ID, makes it easier for me.
Your question would definitely be easier to answer with more explanation regarding your model. E.g. how you defined [SumOfOrderQuantity] and [6M Sales], since a mistake there could definitely impact the final result. Also, knowing what the result you're seeing is vs. the result you expect would be helpful (using sample data).
My guess, however, is that your DimDate table is a standard date table (with one row per date), but you want standard deviation by month.
The FILTER statement in your formula limits the date range to the prior 6 full months correctly, but it will still have one row per date. You can confirm this in Power BI by going into the Data View, selecting 'New Table' under Modeling on the ribbon, and putting your FILTER statement in:
Table = FILTER(ALL(DimDate),
DimDate[MonthID]<=(MAX(DimDate[MonthID])-1) &&
DimDate[MonthID]>=(MAX(DimDate[MonthID])-6))
Assuming you have more than one day of sales for a given month, calculating the variance by day rather than by month is going to mess things up.
What I'd suggest trying:
Table = FILTER(SUMMARIZE(ALL(DimDate),[MonthID]),
DimDate[MonthID]<=(MAX(DimDate[MonthID])-1) &&
DimDate[MonthID]>=(MAX(DimDate[MonthID])-6))
The additional SUMMARIZE statement means that you only get one row for each MonthID, rather than 1 row for each date. If your [6M Sales] is the monthly average across all 6 months, and [SumOfOrderQuantity] is the monthly sum for each month, then you should be set to go calculating the variance, squaring, dividing by 6, and square rooting.
If you need to do further troubleshooting, remember you can put a table on your canvas with MonthID, SumOfOrderQuantity and [6M Sales] and compare the numbers you expect at each stage of the calculation with the numbers you're seeing.
Hope this helps.
I was facing a similar problem while trying to calculate the coefficient of variation (Std. /Mean) by SKUS from sales data. I could use the Pivot-Unpivot function in Power Query editor to to do away with the problem of months with missing sales:
1) Export the data with any calculated columns
2) Reimport the data so that the calculated columns are also available in the power query editor
3) Pivoted the data by months
4) Replaced null values with 0s
5) Unpivoted the data
6) Close and apply the query
7) Add a calculated column for the coefficient of variation using the formula
CV = CALCULATE(STDEV.P(Table1[Value]),ALLEXCEPT(Table1,Table1[Product]))/CALCULATE(AVERAGE(Table1[Value]),ALLEXCEPT(Table1,Table1[Product]))
Thus zero sales for the missing months will also be considered both for Standard Deviation and Mean.
I would like to create a calculated column or measure that shows the time elapsed in any given month. Preferably as a percentage.
I need to be able to compare productivity (rolling total) over a month's period between different months.So creating a percentage of time passed in a month would put every month on a level playing field.
Is there a way to do this?
Or is there a better way to compare productivity between 2 months on a rolling basis?
EDIT
I am graphing sales on a cumulative basis. Here is a picture of my graph to demonstrate.[][
Ideally I would like to be able to graph another person's sales on the same graph for a different month to compare.
The problem is each month is different and I don't think power bi allows much customization of the axes.
So I figured a potential solution would be to convert months to percentages of time passed, create two separate graphs and place them on top of each other to show the comparison of sales.
Using percentages doesn't sound right here: one person's "productivity" in February will appear lower than another person's productivity in March just because February has 3 less days.
Just use [Date].[Day].
To answer the original question (even though it shouldn't be used for this), month progress percentage calculated column:
MonthProgress% =
var DaysinMonth = DAY(
IF(
MONTH(MyTable[date]) = 12,
DATE(YEAR(MyTable[date]) + 1,1,1),
DATE(YEAR(MyTable[date]), MONTH(MyTable[date]) + 1, 1)
) - 1
)
return MyTable[date].[Day]/DaysinMonth*100
Also check DAX functions PARALLELPERIOD and DATEADD.
This is the solution I settled on.
I customized the ranges for the x and y axes so they match.
For the y-axis, I simply put the range from 0 to 50+ our highest month.
For the x-axis, I created a column with the DAY function so I got a number assigned to each day of the month which allowed me to manually set the chart range from 0 to 31. I have asked another question on how to only get workdays.