I have data structured like this
Date, Group, Value
1/1/2020, A, 10
2/1/2020, A, 5
3/1/2020, A, 7
1/1/2020, B, 1
2/1/2020, B, 3
3/1/2020, B, 7
1/1/2020, C, 1
2/1/2020, C, 3
3/1/2020, C, 7
With ~20 Groups.
I want to create Pie/Tree chart, to show f.e. Top 5 Values and the rest as OTHER.
It's pretty simple to do once, but what if I have slicer that filters the graph by Years or Months, accuired from Dates Table connected to Date column?
So when I check only 2020, it would show Top N + Others for 2020 only, and not all data?
Only TopN in graph filters isn't enough, since it sums values on graph to 100%, and I want to see what's the real percentage in terms of all Groups.
Tangent: I wrote a lengthy reply telling you how to make a proper measure that would return the correct percentage, and in the end realized TopN would make it sum up to 100% anyway. So, always keep that in mind: if you pay attention, you'll learn even when you teach ;) Anyway,
What I would try then, is to have a specific measure, such as
[Value % (top N)] =
var denominator = CALCULATE ( [Value], ALLSELECTED ( Data_table ))
var rank = RANKX ( VALUES ( Data_Table[Group] ), [Value] ))
return IF ( rank <= *N*, DIVIDE ( [Value], denominator ))
and then, remove the TOPN filter from the visual. For groups which are not in the top N, the measure will return blank, so they won't show on a visual.
If, however, you want to show the "Others" line, with the rest of the sum, then you need to work a little more. For one thing, you need to have a line with "Others" for the group, on the same table as the actual groups you're summing, and the measure will need more branching logic to take that into account.
But the basic logic is still the same, using RANKX over the values in the Group column, then returning blank for most items (IF() by default returns a blank if the condition is false, but you can always have the measure return BLANK() yourself, in some edge cases).
Related
I trying to work with COVID data and find the day on day increase of cases. Essentially, take today's value and minus yesterday's value to get the increase figure. My data also starts on April 10th so if the data is this, I will return a 0.
Given the below formula, the 0 is correctly returned for April 10th but all other values return 17908. All column types are 'Whole number'. Can anybody give me some information on this? Apologies if this is an obvious issue, I am used to working with Python and R and have been thrust into Power BI.
My data is very simple. It just continues like this:
ID Date No of cases
1) 10 April 3
1) 11 April 6
1) 12 April 15
Diff_Daily =
VAR blankValue = 0
VAR difference =
SUM ( Table[No of cases] )
- CALCULATE ( SUM ( Table[No of cases] ), PREVIOUSDAY ( Table[Date] ) )
RETURN
IF ( Table[Date].[Date] = DATE ( 2020, 04, 10 ), blankValue, difference )
In order to solve that I made the following measures. I made more than one just because to me it usually makes more sense (you can re-use them in a KPI or other charts) but you can merge them together if you don't need them all.
Cases = SUM(MyTable[No Of Cases])
Cases (prev Day) =
CALCULATE(
[Cases]
,PREVIOUSDAY(MyTable[Date])
)
Daily Delta =
IF(ISBLANK([Cases (prev Day)])
,0
,[Cases] - [Cases (prev Day)]
)
Let me know if this helps.
About your formula, It looks nice to me, I suggest you check the data type of your columns, especially the date one. other than that the only error I see is the use of a field inside the IF statement, you may want to use ISBLANK([MyMeasure]) (at least in this case)
I am trying to give rank to each row in daily stock price details table to figure out previous day closing price:
The code I use is:
rank =
RANKX(
FILTER(
ALL(NSE_DAILY_REPORT),
NSE_DAILY_REPORT[SYMBOL]="ADLABS"
),
MAX(NSE_DAILY_REPORT[TDATE]),,
ASC
)
The problem is that it returns a rank of 1 for all rows.
Try changing MAX(NSE_DAILY_REPORT[TDATE]) to NSE_DAILY_REPORT[SCLOSE]
The second argument expects an expression to compare to evaluations of that same expression in the filtered subset. Using MAX will yield that every record gets ranked in a set of just one record, hence the 1 for all rows.
So if I understand correctly your goal is to get the closing price of the day before?
In that case RANKX() is not necessary in contrast to the post you shared as an example. There they create ordinality by creating a ranking first and then perform a pretty inefficient calculation to get the previous in rank. There already is ordinality as you have a date column. Power BI already knows how to interpret that scale, so getting a value for a previous day does not need an additional ranking.
There's tons of posts around stack overflow dealing with this problem. Have a look around to learn more. For your particular problem, the solution will be a calculated column with code looking something like:
PreviousDay =
CALCULATE (
SUM ( NSE_DAILY_REPORT[SCLOSE] ),
FILTER (
ALLEXCEPT ( NSE_DAILY_REPORT, NSE_DAILY_REPORT[SYMBOL] ),
NSE_DAILY_REPORT[TDATE] = EARLIER(NSE_DAILY_REPORT[TDATE]) - 1
)
)
It will do the trick, but it still has some inefficiencies you can improve by looking through other examples like this
I am trying to aggregate the following values (NHS, Social Care and Both B) by the reasons for delays column so i can find the reason with the highest value (from the 3 combined values named above).
I have tried using summarize to create a table with just the reasons for delays ,NHS, Social Care and Both B columns. By doing this i hoped i could create a column named totals which adds the NHS, Social Care and Both B Columns together in this summarized table thus giving me the total values for each reason for delay.
Though when i tried to run a maxx function around my totals column it seems to give me the wrong values.
I have tried wrapping my table with the distinct function so it aggregates all the columns in my summarize together, but this did not help either.
Max Delays =
MAXX (
SUMMARIZE (
csv,
csv[Reason For Delay],
csv[NHS],
csv[Social Care],
csv[Both B],
"totals", CALCULATE ( SUM ( csv[NHS] ) + SUM ( csv[Both B] ) + SUM ( csv[Social Care] ) )
),
[totals]
)
The smaller table (which should represent the summarized table) in the above picture with the total column shows the values i expect to carry my max calculation over, where i expect the max value to be 277.
The max value i am getting instead is 182. This is the max value in the unsummarized table below where i have multiple duplicates of my reasons for delay column and 182 is the highest value.
I have uploaded a sample of the pbix file i am working on if it may be of help;https://www.zeta-uploader.com/en/1184250523
First, create a measure for total reasons:
Total Reasons = SUM(csv[NHS]) + SUM(csv[Both B]) + SUM(csv[Social Care])
Second, create a measure for the max reason:
Max Reason = MAXX( VALUES(csv[Reason For Delay]), [Total Reasons])
Result:
How it works:
The first measure is for convenience. You can re-use it in other formulas, making code cleaner;
In the second measure, we create a list of distinct reasons using VALUES. Then MAXX iterates this list, calculates total for each reason, and then finds the largest of them.
Charts and visuals on Power BI can be filtered to the top n records for easier at-a-glance reporting. However, if I want to report the top and bottom records (eg: top 10 and bottom 10 dollar amounts) I need to place two visuals. This consumes more report space and breaks the cohesion of the page.
Is there a way to select two data subsets in one graph for this kind of reporting?
Here is the sample data I threw together.
On top of that, I created a simple measure for the Total Amount.
Total Amount = SUM(Data[Amount])
With that I created a new measure that will essentially flag each row as being in the Top or Bottom 3 (you can change the number to meet your needs).
This measure first checks if there is a value for Total Amount and "removes" any that have a blank value ("removes" by making the flag blank and thus will never be included in any filtering or such).
TopBottom =
IF(
ISBLANK([Total Amount]),
BLANK(),
IF(
RANKX(ALL(Data), [Total Amount], , ASC) <= 3 || RANKX(ALL(Data), [Total Amount], , DESC) <= 3,
1,
0
)
)
Once you have the ranking flag measure, you can add it to your visual and then filter to where the measure is 1.
Once that is all finished, you should have a visual only showing the entries you care about. Here is the full list of data with the flag visible and the resulting table when applying the filter.
Consider this as a slight improvement to the accepted answer. With this one, you don't need to change the formula whenever you want to change how many you want to see.
The only control you will need to change is the filter.
RankTopBottom = RANKX(ALL(Data), [Total Amount], , ASC) *
RANKX(ALL(Data), [Total Amount], , DESC)
It uses basically the same principle of the accepted answer, but instead of using an IF, we multiply both rankings. The smallest values will be the edges, the gratest values will be the middle.
So, when filtering, use "bottom" N, and pick an even number. (Or add a negative sign if you want the "top" N instead)
A quick result of multiplying inverse ranks:
I have following scenario which has been simplified a little:
Costs fact table:
date, project_key, costs €
Project dimension:
project_key, name, starting date, ending date
Date dimension:
date, years, months, weeks, etc
I would need to create a measure which would tell project duration of days using starting and ending dates from project dimension. The first challenge is that there isn't transactions for all days in the fact table. Project starting date might be 1st of January but first cost transaction is on fact table like 15th on January. So we still need to calculate the days between starting and ending date if on filter context.
So the second challenge is the filter context. User might want to view only February. So it project starting date is 1.6.2016 and ending date is 1.11.2016 and user wants to view only September it should display only 30 days.
The third challenge is to view days for multiple projects. So if user selects only single day it should view count for all of the projects in progress.
I'm thankful for any help which could lead towards the solution. So don't hesitate to ask more details if needed.
edit: Here is a picture to explain this better:
Update 7.2.2017
Still trying to create a single measure for this solution. Measure which user could use with only dates, projects or as it is. Separate calculated column for ongoing project counts per day would be easy solution but it would only filter by date table.
Update 9.2.2017
Thank you all for your efforts. As an end result I'm confident that calculations not based on fact table are quite tricky. For this specific case I ended up doing new table with CROSS JOIN on dates and project ids to fulfill all requirements. One option also was to add starting and ending dates as own lines to fact table with zero costs. The real solution also have more dimensions we need to take into consideration.
To get the expected result you have to create a calculated column and a measure, the calculated column lets count the number of projects in dates where projects were executed and the measure to count the number of days elapsed from [starting_date] and [ending_date] in each project taking in account filters.
The calculated column have to be created in the dim_date table using this expression:
Count of Projects =
SUMX (
FILTER (
project_dim,
[starting_date] <= EARLIER ( date_dim[date] )
&& [ending_date] >= EARLIER ( date_dim[date] )
),
1
)
The measure should be created in the project_dim table using this expression:
Duration (Days) =
DATEDIFF (
MAX ( MIN ( [starting_date] ), MIN ( date_dim[date] ) ),
MIN ( MAX ( [ending_date] ), MAX ( date_dim[date] ) ),
DAY
)
+ 1
The result you will get is something like this:
And this if you filter the week using an slicer or a filter on dim_date table
Update
Support for SSAS 2014 - DATEDIFF() is available in SSAS 2016.
First of all, it is important you realize you are measuring two different things but you want only one measure visible to your users. In the first Expected result you want to get the number of projects running in each date while in the Expected results 2 and 3 (in the OP) you want the days elapsed in each project taking in account filters on date_dim.
You can create a measure to wrap both measures in one and use HASONEFILTER to determine the context where each measure should run. Before continue with the wrapping measure check the below measure that replaces the measure posted above using DATEDIFF function which doesn't work in your environment.
After creating the previous calculated column that is required to determine the number of projects in each date, create a measure called Duration Measure, this measure won't be used by your users but lets us calculate the final measure.
Duration Measure = SUMX(FILTER (
date_dim,
date_dim[date] >= MIN ( project_dim[starting_date] )
&& date_dim[date] <= MAX ( project_dim[ending_date] )
),1
)
Now the final measure which your users should interact can be written like this:
Duration (Days) =
IF (
HASONEFILTER ( date_dim[date] ),
SUM ( date_dim[Count of Projects] ),
[Duration Measure]
)
This measure will determine the context and will return the right measure for the given context. So you can add the same measure for both tables and it will return the desired result.
Despite this solution is demonstrated in Power BI it works in Power Pivot too.
First I would create 2 relationships:
project_dim[project_key] => costs_fact[project_key]
date_dim[date] => costs_fact[date]
The Costs measure would be just: SUM ( costs_fact[costs] )
The Duration (days) measure needs a CALCULATE to change the filter context on the Date dimension. This is effectively calculating a relationship between project_dim and date_dim on the fly, based on the selected rows from both tables.
Duration (days) =
CALCULATE (
COUNTROWS ( date_dim ),
FILTER (
date_dim,
date_dim[date] >= MIN ( project_dim[starting_date] )
&& date_dim[date] <= MAX ( project_dim[ending_date] )
)
)
I suggest you to separate the measure Duration (days) into different calculated column/measure as they don't actually have the same meaning under different contexts.
First of all, create a one-to-many relationship between dates/costs and projects/costs. (Note the single cross filter direction or the filter context will be wrongly applied during calculation)
For the Expected result 1, I've created a calculated column in the date dimension called Project (days). It counts how many projects are in progress for a given day.
Project (days) =
COUNTROWS(
FILTER(
projects,
dates[date] >= projects[starting_date] &&
dates[date] <= projects[ending_date]
)
)
P.S. If you want to have aggregated results on weekly/monthly basis, you can further create a measure and aggregate Project (days).
For Expected result 2 and 3, the measure Duration (days) is as follows:
Duration (days) =
COUNTROWS(
FILTER(
dates,
dates[date] >= FIRSTDATE(projects[starting_date]) &&
dates[date] <= FIRSTDATE(projects[ending_date])
)
)
The result will be as expected: