I am attempting to calculate the most recent 6-Month STDEVX.P (not including the current month; so in May 2017, I'd like to the STDEVX.P for periods Nov 2016 - Apr 2017) for sales by product in order to further calculate variation in sales orders.
The Sales Data is made up of daily transactions so it contains transaction date: iContractsChargebacks[TransactionDate] and units sold: iContractsChargebacks[ChargebackUnits], but if there are no sales in a given period, then there will be no data for that month.
So, for example, on July 1st, sales for the past 6 months were the following:
Jan 100
Feb 125
Apr 140
May 125
Jun 130
March is missing because there were no sales. So, when I calculate STDEVX.P on the data set, it is calculating it over 5 periods, when in fact there were 6, just one happens to be zero.
At the end of the day, I need to calculate STDEVX.P for the current six month period. If when pulling the monthly sales numbers, it only comes back with 3 periods(months), then it needs to assume the other 3 periods with a zero value.
I thought about manually calculating standard deviation instead of using the DAX STDEVX.P formula and found these 2 links as a reference on how to do so, the first being closest to my need:
https://community.powerbi.com/t5/Desktop/Problem-with-STDEV/td-p/19731
Calculating the standard deviation from columns of values and frequencies in Power BI...
I attempted to make a go of it, but still am not getting the correct calculation. My code is:
STDEVX2 =
var Averageprice=[6M Sales]
var months=6
return
SQRT(
DIVIDE(SUMX(
FILTER(ALL(DimDate),
DimDate[Month ID]<=(MAX(DimDate[Month ID])-1) &&
DimDate[Month ID]>=(MAX(DimDate[Month ID])-6)
),
(iContractsChargebacks[SumOfOrderQuantity]-Averageprice)^2),
months
)
)
*note: Instead of using date parameters in the code, I created a calculated column in the date table that gives each Month a unique ID, makes it easier for me.
Your question would definitely be easier to answer with more explanation regarding your model. E.g. how you defined [SumOfOrderQuantity] and [6M Sales], since a mistake there could definitely impact the final result. Also, knowing what the result you're seeing is vs. the result you expect would be helpful (using sample data).
My guess, however, is that your DimDate table is a standard date table (with one row per date), but you want standard deviation by month.
The FILTER statement in your formula limits the date range to the prior 6 full months correctly, but it will still have one row per date. You can confirm this in Power BI by going into the Data View, selecting 'New Table' under Modeling on the ribbon, and putting your FILTER statement in:
Table = FILTER(ALL(DimDate),
DimDate[MonthID]<=(MAX(DimDate[MonthID])-1) &&
DimDate[MonthID]>=(MAX(DimDate[MonthID])-6))
Assuming you have more than one day of sales for a given month, calculating the variance by day rather than by month is going to mess things up.
What I'd suggest trying:
Table = FILTER(SUMMARIZE(ALL(DimDate),[MonthID]),
DimDate[MonthID]<=(MAX(DimDate[MonthID])-1) &&
DimDate[MonthID]>=(MAX(DimDate[MonthID])-6))
The additional SUMMARIZE statement means that you only get one row for each MonthID, rather than 1 row for each date. If your [6M Sales] is the monthly average across all 6 months, and [SumOfOrderQuantity] is the monthly sum for each month, then you should be set to go calculating the variance, squaring, dividing by 6, and square rooting.
If you need to do further troubleshooting, remember you can put a table on your canvas with MonthID, SumOfOrderQuantity and [6M Sales] and compare the numbers you expect at each stage of the calculation with the numbers you're seeing.
Hope this helps.
I was facing a similar problem while trying to calculate the coefficient of variation (Std. /Mean) by SKUS from sales data. I could use the Pivot-Unpivot function in Power Query editor to to do away with the problem of months with missing sales:
1) Export the data with any calculated columns
2) Reimport the data so that the calculated columns are also available in the power query editor
3) Pivoted the data by months
4) Replaced null values with 0s
5) Unpivoted the data
6) Close and apply the query
7) Add a calculated column for the coefficient of variation using the formula
CV = CALCULATE(STDEV.P(Table1[Value]),ALLEXCEPT(Table1,Table1[Product]))/CALCULATE(AVERAGE(Table1[Value]),ALLEXCEPT(Table1,Table1[Product]))
Thus zero sales for the missing months will also be considered both for Standard Deviation and Mean.
Related
My backend table has the data at a week level. It contain the current ISO year and current ISO week, as well as, the previous year's ISO year and week number that the current year's data should be compared with.
For each signup_iso_year-signup_iso_week combination, there exists only one iso_prev_year-iso_prev_yearweek combination.
The iso_prev_year, iso_prev_yearweek columns account for the offset that might occur due to certain years having 53 weeks instead of 52.
data table
(I can't embed images, so I have added a table here as well, although it has much less information than the image in 'data table').
Number_of_signups
signup_iso_year
signup_iso_week
iso_prev_year
iso_prev_yearweek
Country
grade_level
5
2020
18
2019
18
IN
middle school
7
2020
18
2019
18
US
high school
6
2021
17
2018
18
IN
middle school
8
2021
17
2018
18
US
high school
I want to calculate to Year-Over_Year Change in number_of_signups using the signup_iso_year, signup_iso_week, iso_prev_year, iso_prev_yearweek columns.
I have already tried to create a calculated column that contains the sum of number_of_signups from previous year, but since every combination of country, grade_level, subject, email_type might not exist in previous or current year, some of the values are getting lost and hence giving incorrect results.
The answer I am looking for, is a Power BI measure that can give me the YOY change based on signup_iso_year and signup_iso_week.
Edit: I should have mentioned this before, but I forgot. The table contains data from 2018 to current day. So, the data size is quite large. Also, I need this YoY measure for a time series visual, which means that I can't assign ISOyear/ISOweek values for previous year using simple MAX/MIN functions. It needs to pick values from the iso_prev_year, iso_prev_yearweek columns but since EARLIER function can't be used in a measure, I am not able to figure out how to do that.
Which is why I had tried to create a calculated column, and use the EARLIER function to compute previous year's number_of_signups. But because of the other columns present in the data, i.e., country, grade_level, subject, email_type, there were discrepancies occurring in the actual number_of_signups and the calculated previous_year_number_of_signups. These discrepancies were due to the fact that not every combination of these columns exists for each week, so we might miss out on some data when calculating previous_year_number_of_signups.
Edit 2: Was asked to include examples of what the expected result would look like, so adding some pictures.
YoY at overall level, country level, grade level
YOY at country+grade level
If I understand your requirement correct, you need a Measure like below. Remember, this may not the exact one you need, but this will definitely help you to reach your required output.
prev_signup =
var iso_prev_year = MIN(your_table_name[iso_prev_year])
var iso_prev_year_week = MIN(your_table_name[iso_prev_yearweek])
RETURN
CALCULATE(
SUM(your_table_name[Number_of_signups]),
FILTER(
ALL(your_table_name),
your_table_name[signup_iso_year] = iso_prev_year
&& your_table_name[signup_iso_week] = iso_prev_year_week
)
) + 0
You can also do some transformation in Power Query Editor and join the same table using your Key columns. That case, you can bring previous year's value in the same row. Rest is just compare 2 columns from your table to calculate YOY
I have a "strange" problem in visualizing the values of the "last" year (that is the most current year with data of a full year). In the example delivered it is 2019 which holds the bookings facts. The year should be derived by a related table (LetztesVollesJahrDim) that holds a min/max date and year value for each company/entity. The DateDim table holds all dates and is configured to be the Date dimension.
The relations are shown in the graphic below.
I created a Report that should display different matrix tables with different values:
one, that shows the last full year's monthly values by Cost Center (which works correct)
one, that shows all years monthly values including an estimated value for the current year (2020) (which also works correct)
one, that could drill down to the details level of the Facts and display the figures per month (in columns) of the last full year (i.e. 2019) putting the Cost Center and other groups down the most detailed level into rows. This Matrix makes use of Calculated Measures created with Tabular Editor.
At this point I can here you say... try it without the Calculated measures and indeed I did that by simply displaying a Card that got a Visual Filter to DateDim[M]=7 to simulate the appliance of each matrix's column. Problem is the same: The Month filter of the Visual (or within the Calculated Measure) is ignored and the yearly sum is displayed (~48k) which is wrong.
As I am currently an Expert in SQL Server DBMS but not in the Architecture of Power BI's DAX Models I am not truely aware about the consequences when and why filters are removed, ignored, overwritten or added at which level of modeling.
Originally I tried to create a simple Calculated Measure that reflects the last full year of the company. That works, but its usage in the huge Matrix was impossible as it was calculating forever. That's why I created the simple table "LetztesVollesJahrDim" to hold a persisted value for each company.
The idea is simple: Create a Query that inner joins these tables and display the Sum for each month [M] like this:
Fact[Turnover] - Fact[CompanyKey] -> CompanyDim[CompanyKey] <-> LastFullYear[CompanyKey] - LastFullYear[MostCurrentYear] -> DateDim[Y] - DateDim[DateKey] -> Fact[StapleDateKey]
So, what is the Problem?
I tried a couple of DAX queries and all come up with a different wrong value.
Following three different approaches for a Calculated Measure "Sum LY":
1.
Σ LY = CALCULATE(SUM(KontobuchungenFact[UmsatzNegiert]), DATESBETWEEN(DateDim[DateKey], DATE(2019, 1, 1), DATE(2019, 12,31)))
2.
Σ LY = CALCULATE(SUM(KontobuchungenFact[UmsatzNegiert]), SAMEPERIODLASTYEAR(DateDim[DateKey]))
[
3.
Σ LY = CALCULATE(SUM(KontobuchungenFact[UmsatzNegiert]), KEEPFILTERS(DateDim[M]), USERELATIONSHIP(DateDim[J], LetztesVollesJahrDim[AktuellstesJahr]), USERELATIONSHIP(DateDim[DateKey], KontobuchungenFact[StapelDateKey]))
Filters being applied on the page/visual:
CompanyKey
DateKey >= 2018-01-01 (page filter that limits the displayed rows in yearly matrix to the last 3 ys)
Some other irrelevant keys
Visual configuration:
Relations:
Values/Rows in the LetztesVollesJahrDim table:
Calculated Measures/Table in Tabular Editor showing the calculation of the "01 Jan" column of the first Matrix, which displays correct results:
Impression of the Report:
So in summary I need a clue/DAX formula that recognizes the Monthes in each column and ideally uses the relations to the last year table trespassing the year's filter through DateDim to the Facts.
It is funny, that the upper matrix works, but not the one on the bottom. It is not possible to use the calculated measures approach of the first matrix in the last matrix because the performance would drop to > minutes calculation. So I cannot use the same approach and need a fast one.
Anybody an idea? :-)
I just want to select a range of days in a slicer and show in a table the number of days for each month/period (month-year).
I used DAX to create a table with the information I need and I don't have problems with the periods (first column), it changes dinamically, the problem is the column "Days" (second column) because it's always showing the total number of days for each month.
Here my DAX code
SelectedPeriods = GROUPBY(DimDate;DimDate[Period];"Days";COUNTX(CURRENTGROUP();DimDate[DateKey]))
Here the result
What I expect is:
2 for april, 31 for may, 1 for june
This is an issue with execution order.
SelectedPeriods = GROUPBY(DimDate;DimDate[Period];"Days";COUNTX(CURRENTGROUP();DimDate[DateKey]))
Generates a calculated table. These are calculated when the data model is refreshed and stored in it. They are not refreshed each time a connected dimension is changed within a dashboard.
In your case, while changing date filters may hide rows from this table the number of days remains fixed at the number calculated initially when there was no filter context on the data i.e. counting all days in the month.
If you want the result to change then you need to use a measure instead of a calculated table. Measures react to the current filter context within the report and so will adjust their output each time a slicer is changed.
The needed measure will depend on your model but might be something as simple as:
CountOfDays := CountRows(DimDate)
I would like to create a calculated column or measure that shows the time elapsed in any given month. Preferably as a percentage.
I need to be able to compare productivity (rolling total) over a month's period between different months.So creating a percentage of time passed in a month would put every month on a level playing field.
Is there a way to do this?
Or is there a better way to compare productivity between 2 months on a rolling basis?
EDIT
I am graphing sales on a cumulative basis. Here is a picture of my graph to demonstrate.[][
Ideally I would like to be able to graph another person's sales on the same graph for a different month to compare.
The problem is each month is different and I don't think power bi allows much customization of the axes.
So I figured a potential solution would be to convert months to percentages of time passed, create two separate graphs and place them on top of each other to show the comparison of sales.
Using percentages doesn't sound right here: one person's "productivity" in February will appear lower than another person's productivity in March just because February has 3 less days.
Just use [Date].[Day].
To answer the original question (even though it shouldn't be used for this), month progress percentage calculated column:
MonthProgress% =
var DaysinMonth = DAY(
IF(
MONTH(MyTable[date]) = 12,
DATE(YEAR(MyTable[date]) + 1,1,1),
DATE(YEAR(MyTable[date]), MONTH(MyTable[date]) + 1, 1)
) - 1
)
return MyTable[date].[Day]/DaysinMonth*100
Also check DAX functions PARALLELPERIOD and DATEADD.
This is the solution I settled on.
I customized the ranges for the x and y axes so they match.
For the y-axis, I simply put the range from 0 to 50+ our highest month.
For the x-axis, I created a column with the DAY function so I got a number assigned to each day of the month which allowed me to manually set the chart range from 0 to 31. I have asked another question on how to only get workdays.
Need help in Data Analytics Calculations.
Currently, I am getting historical data for consumption as follows:
on above data, I am adding custom columns for calculating exact consumption(gallons) in no. of days. like:
Now, I have to plot month wise bar chart for consumption of respective Meter ID in 2016 year. But problem here is, I will have to calculate Every months consumption by dividing it in days in each respective month of 2016, and then only I will able to plot them monthly like:
y axis = consumption in every month
x axis = Jan Feb March Apr May Jun Jul Aug Sep Oct Nov Dec
so, in jan month, consumption should be = 10 + 100 + ((115/38) * 7) gallons
Notes: here, in ((115/38) * 7) : we are calculating avg consumption of single day 7 days in Jan and whole march and then getting last 7 day consumption of Jan so that we can add it in calculation of total consumption of Jan month
but how to add measure/custom column/new table for these calcualtions?
Thanks
What you need to do is relatively complicated, but the summary of my solution is:
Calculate the per-day consumption
Calculate the start and end date of each reading (e.g. the previous reading date plus one day, and the reading date)
Expand your data to have 1 row per day rather than 1-row per reading
You want to do these steps before you load the data into your data model (i.e. in your source system, or as the data is loaded using the Query Editor/Power Query).
Below, I assume you're using the Query Editor/Power Query. However, if you can use your source system, it's often the better choice (since the source system may be a database that is vastly faster than your desktop).
Note that your No. of Days calculation doesn't make sense to me. There are more than 38 days between 24 Jan 2016 and 31 Mar 2016. There are also more than 13 days between 10 Jan and 24 Jan. For this reason, it was difficult to tell whether you wanted a new reading to count on the day the previous reading was taken, or on the next full day. I assume the former. Also note, I've proceeded on the basis that your No. of Days calculation is correct
Calculate the Per Day Consumption
This is the easiest step, given that you have already calculated the Consumption and the No. of Days. Just divide one by the other. In the Query Editor, you can click in the Consumption (gallons) column and select Add Column > Standard > Divide. Under Value, choose Use values in a column and then select the No. of Days column.
Calculate the Start & End Date of Each Reading
The date of the reading is the end date, so you can rename Date to be End Date (since a reading is applied retroactively).
For the start date, in the Query Editor, you will need to add an index column (Add Column > Index Column). You will want to make sure your data is sorted by Meter ID and Date Ascending before doing this. Call the column Index.
Next, Add Column > Custom Column and pull the reading date from the prior row. Call the new column Previous End Date for now.
// A try is necessary because we can't get the previous row if there is no previous row (we'll get an error, which we can handle in the 'otherwise' block)
try
if
// See if the previous row is for the same Meter ID
[Meter ID] = #"Reordered Columns"{[Index] - 1}[Meter ID]
then
// If it is, grab the Reading Date from the previous row
#"Reordered Columns"{[Index]-1}[End Date]
else
// If this is the first reading for a meter, calculate the Start Date by subtracting the No. of Days from the End Date
Date.AddDays([End Date], -[No. of Days])
otherwise
// If this is the first row in the table, also calculate the Start Date by subtracting the No. of Days from the End Date
Date.AddDays([End Date], -[No. of Days])
Next, you'll want to add 1 to the Start Date, as we want the reading to apply to the day after the previous reading, not on the day of the previous reading.
Note, if you want the reading date to count in the prior period, subtract 1 from the End Date rather than add 1 to the start date (previous end date).
Expand your data to have 1 row per day
At this point, you should have a Meter ID, Start Date, End Date, and per day consumption column that reflects what you expect (i.e. the per day consumption is correct for the date range).
The final step is to duplicate each row for each date in the date range. There are several solutions to this outlined in this thread (https://community.powerbi.com/t5/Desktop/Convert-date-ranges-into-list-of-dates/td-p/129418), but personally, I recommend the technique (and video) posted by MarcelBeug (https://youtu.be/QSXzhb-EwHM).
You should end up with something more like this (after some removing & renaming of columns):
Finally
Now that you have one row per meter & date, with a per day consumption already calculated, you can build a visual. For example, you could do a column chart with Date on the Axis, and Consumption per Day as the value. By default, Power BI will recognize that Date is a date, and will roll it up by Year-Quarter-Month-Day. Press the little 'x' by Year and Quarter, and you'll have a chart that sums up the per day consumption by month. You can also drill down to individual date.
Further Reading
Reading a value from a previous row in Power Query
If Statements in Power Query
The AddDays function in Power Query
Adding Comments in Power Query
Catching Errors in Power Query
Converting a date range into a list of dates (Marcel Beug's solution)
A similar problem I previously answered