Power BI - Show hierarchical/grouping data - powerbi

I'm looking to see if it's possible in Power BI to have a widget that shows "common" data in one row, then "differentiating" data below it in other rows.
For example, let's say I want to show a list of TV shows. And below each TV show, I want to show data about each season. So data might look like (I'm not filling out all of the data, just enough to show an example):
TV Show Title
Broadcaster
Genre
Season #
Season Premiere Episode Title
Mad Men
AMC
Period Drama
1
Smoke Gets in Your Eyes
Mad Men
AMC
Period Drama
2
For Those Who Think Young
Game of Thrones
HBO
Fantasy
1
Winter is Coming
Game of Thrones
HBO
Fantasy
2
The North Remembers
I want to show the data that looks kind of like this:
TV Show Title
Broadcaster
Genre
Season #
Season Premiere Episode Title
+/-
Mad Men
AMC
Period Drama
1
Smoke Gets in Your Eyes
2
For Those Who Think Young
+/-
Game of Thrones
HBO
Fantasy
1
Winter is Coming
2
The North Remembers
At first I thought I could use a matrix, but it doesn't seem to work that way I hoped.
Suggestions?

Like this?
If so, turn stepped layout off in the options.
My data looks like this

Related

Wrong calculations for rows in Power BI matrix

I am trying to calculate market share, but struggling with doing it correctly.
I have a matrix where I have Category, Name as rows, Channel as column, and Market Share as value.
Also In my dataset I have columns ABS_COMPANY (with sales inputted there if company = "A", so there are some blank ones), and ABS_TOTAL (with sales inputted in each row)
so my measure Market Share:
Market Share = SUM(table\[ABS_COMPANY\]) / SUM(table\[ABS_TOTAL\])
This correctly calculates values for each Category, but when I open the drop-down to see Name, Market Share of each Name equals to 100%. What is the problem and how to fix it?
e.g. What is now:
Market Share
Pens 43%
pen 1 100%
pen 2 100%
pen 3 100%
Pencils 29%
penc 1 100%
penc 2 100%
penc 3 100%
I've tried using Calculate(), but it does not work in a way I want to.
Unfortunately, I cannot share the data as it is sensitive.
Structure of dataset:
NAME STRING
CATEGORY STRING
CHANNEL STRING
ABS_COMPANY DECIMAL(20,2) - value of sales for each name
ABS_TOTAL DECIMAL(20,2) - it is a value grouped by CHANNEL AND CATEGORY at the backend

Best imputation for principal component analysis

I have a list of assets. I use these assets to create an asset index using pca:
local assets newspaper magazine clock water borehole table bed study bicycle cart car motorcycle tractor electricity fridge airconditioner fan washing_machine vacuum
alpha $assets
pca $assets
predict asset_idx
label var asset_idx "pupil asset ownership index: created using pca"
egen asset_idx_std = std(asset_idx)
However, I am missing about 3% of assets for each variable. This amounts to 30% after pairwise deletion. As such, I wish to impute the missing variables so every student with less than 10% of observations missing is not deleted during PCA. I do not otherwise wish to mi set, but can't work out another way:
mi set wide
mi register imputed newspaper magazine clock water borehole table bed study bicycle cart car motorcycle tractor electricity fridge airconditioner fan washing_machine vacuum computer internet radio tv vcr dvd cd cassette camera digi_camera vid_cam landline phone
set seed 1234
mi impute logit newspaper magazine clock water borehole table bed study bicycle cart car motorcycle tractor electricity fridge airconditioner fan washing_machine vacuum computer internet radio tv vcr dvd cd cassette camera digi_camera vid_cam landline phone, force add(1)
Unfortunately, this is only successfully imputing a small fraction of missing observations:
------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
newspaper | 7007 110 4 | 7117
------------------------------------------------------------------
Any suggestions on appropriate imputation are much appreciated.

How to deal with multiple ids multiple categories table to reach THIS on Power BI

I have a problem that i was trying to solve 3 days ago and i'm not able to.
I have the following tables:
Companies
company_id
sales
1
2000
2
3000
3
4000
4
1000
Categories
company_id
category
1
medical
1
sports
2
industrial
3
consumption
4
medical
4
consumption
All i want to reach is a COLUMN CHART with a CATEGORY SLICER where i choose the category and i see the TOP 5 companies by category and sales. Yes, in this example the TOP is not needed but in my real case i have 400 companies so i want to:
Select and Show only the required category.
In that category, show only the 5 better companies by sales.
The problem here is Power BI takes all the companies for the TOP N filter so if i choose a category, and also try a top 5, if the companies are not in the TOP5 all companies list, it doesn`t show anything.
Thanks!
If you always want to show the same Top N values in your visual, you can use the filter pane to achieve that.
Below a walk through:
The to add the Top N filtering, I add the following:
It is in Dutch, so a little clarification:
I add a 'filter on this visual'
I selected Populairste N, which is Top N
And as a value I drag and dropped the maximum of sales.
Results:
Things to keep in mind:
You are using a many to many relationship, make sure that this is activated in the Power BI model.
Make sure the direction of filtering is from category to sales, otherwise the slicer will not work. It looks like this:

RANKX in table and slicers

This is table Fact:
Date Person Place Status Sales
01/01/2020 ABC North Active 9852
14/01/2020 DEF North Active 3452
17/01/2020 GHI North Active 9084
02/02/2020 GHI North Active 4902
14/02/2020 GHI North Active 4659
14/02/2020 DEF South Inactive 5000
23/02/2020 GHI North Active 1685
10/03/2020 GHI North Active 6401
21/03/2020 ABC Center Active 4742
09/04/2020 ABC Center Active 6325
14/04/2020 ABC Center Active 8329
27/04/2020 ABC Center Inactive 7740
28/04/2020 ABC Center Inactive 5091
02/05/2020 ABC Center Inactive 3763
04/05/2020 ABC Center Inactive 1434
06/05/2020 DEF Center Active 3718
22/05/2020 DEF South Active 6639
03/06/2020 DEF South Active 5672
12/06/2020 DEF South Active 5268
16/06/2020 DEF South Active 3358
I want to calculate the ranking of sales, depending on slicers for dimensions date, person, status and place.
So:
This measure:
TotalSales = SUM('Fact'[Sales])
Gives me the to total sales.
And:
This measure:
Ranking =
IF([TotalSales],
RANKX (
ALLSELECTED ('Fact'),
CALCULATE ( [TotalSales] ),
,
DESC,
DENSE
))
Is supposed to give me the ranking.
And it does, if all the dimensions are in:
However, if I remove the date:
This is non-sense. Can anyone help calculating the rank? Thanks
Sorry I'm replacing my answer completely because I misunderstood your problem.
I used the following:
Rank1 =
CALCULATE(
RANKX(
ALLSELECTED(Table1[Person], Table1[Place], Table1[Status]),
CALCULATE(
SUM(Table1[Sales])
)
)
)
Using your data, I made the visuals below, and am able to slice by place, person, and status. The problem with slicing by date, however, is that it's too granular, so that everything becomes #1 within that date period. You may want to add new columns like "Year" or "Quarter" which you can then add to the ALLSELECTED() statement in order to rank by, say, the person's best quarter, or the area's best year.
I opened your file & have some good news. I was able to find the issue.
STEP 1
Select all the Months & Places in your filters in the top. Hold CTRL to select multiple items. On the right, get rid of all the things that are in the way. Just check what I show here.
STEP2
Un-check the Date to see that the report goes crazy.
STEP 3
Un-check Total Sales & check Sales instead. You can also re-check Date here.
STEP 4
Hit the drop-down menu for Sales & select Don't Summarize.
STEP 5
Enjoy that you now have all 20 records again, but with no Date showing.

How to parse a long dataframe of text using regular expressions into a dataframe [R]

I have a giant data frame which I would like to turn into a more usable format. It is based on a copy-pasted text file of a schedule, where the entries have a consistent format.So, I know that each day will have the format:
###
Title - Date
First event
Time: 11:00 AM
Location: Address
Address line 2
Second event description
Time: 12:00 AM
Location: Address
Address
###
What I am having trouble with is figuring out how to parse this. Basically, I want to store everything between the "###"'s as a single day, and then add events based on how many times the above format repeats, and make a string or datetime entry based on if letters are following a "Time:" or a "Location:".
However, I am really having trouble with this. I have tried putting it all into a giant dataframe where each line is a row, and then adding dummies for location rows, time rows, etc as seperate columns, but am not sure how to translate that into discrete days or events. Any help would be appreciated.
Data is public, so a sample is below -- it is a giant dataframe with one row for each row of text:
*Text*
###
The Public Schedule for Mayor Rahm Emanuel – January 5, 2014
There are no public events scheduled at this time.
###
The Public Schedule for Mayor Rahm Emanuel – January 6, 2014
Mayor Rahm Emanuel will greet and thank snow clearing teams from the Department of Streets and Sanitation.
Time: 11:30 AM
Location: Department of Streets and Sanitation
Salt Station
West Grand Avenue and North Rockwell Street
Chicago, IL*
*There will be no media availability.
Mayor Rahm Emanuel and City Officials will provide an update on the City’s efforts to provide services during the extreme weather.
Time: 2:00 PM
Location: Office of Emergency Management and Communications
Headquarters
1411 West Madison Street
Chicago, IL*
*There will be a media availability following.
Mayor Rahm Emanuel will greet and visit with residents taking advantage of a City warming center.
Time: 3:00 PM
Location: Department of Family and Support Services
10 South Kedzie Avenue
Chicago, IL**
*There will be no media availability.
**Members of the media must respect the privacy of residents at the facility, and can only film City of Chicago employees.
###
Edit:
An example output I would like is something like (sorry the code here is broken, not sure why!):
Date Time Description Location
December 4th 9:00 AM A housewarming party 1211 Main St.
December 5th 11:00 AM Another big event 1234 Main St.
If at all possible.
EDIT 2:
Basically -- I know how to pull all this stuff out of the columns. I think my issue may really be reshaping the data intelligently. I have split it into this giant dataframe with one column where each row is a string which corresponds to a row of text int he original schedule. I then have a bunch of columns like "is_time", "is_new_entry", "is_location" which are 1 if the row is a time, new entry beginning, or location. I just don't know how to reshape this into the output above.