Basically I have a big Excel dataset about 500x500 with economic information from various companies.
Each row is representing a different company and in columns we have the information. A little bit of it is qualitative like ZIP code, type, etc. But most of it is quantitative. For each of the quantitative info, we have info for 5 years, so we have one column for each year and for each information i.e. Debt 2019, Debt 2020, etc.
So my question is which is the best way to preprocess this data to work with it and how should it be done. Either doing the preprocessing with Excel, running a Script on PowerBI, using Query, SQL, ...
The objective is to have a report which will be accessible online and the user will type the name of the company and it will show them the dashboard with the information of that company (only that one), so they can navigate through it.
The structure and which information is shown is the same for each company, the only thing that changes is the "numbers" that each company has. So it has to be possible to change which data is showing (to use the one from the company they want).
It also needs to be able to show comparative data to other groups of companies or to the total.
I want to have it right from the start, because then changes get complicated.
I thought about doing sort of a "relational model" with one "table" for each company with the quantitative data (with one row for each year and each column one info point) and then a general table with the qualitative data (with rows being each company and the columns the info). But I am not really sure.
I know how to use Power BI but I have never used it for something this big. I would like to know which way to organize this data is better and some info on how to do it.
Many thanks to everyone.
I thought about doing sort of a "relational model" with one "table" for each company with the quantitative data (with one row for each year and each column one info point) and then a general table with the qualitative data (with rows being each company and the columns the info).
Yes, do that.
General guidance is to use Power Query in PowerBI to transform the data into a star schema model. See Understand star schema and the importance for Power BI
So that would typically result in one table that has the "dimension" data for each company, a date table, and a "fact" table at the grain of (CompanyId,Date) with the quantitative data.
Related
I have a dashboard that looks like this in PowerBi:
Almost every slicer and visual on this page comes from the "visits" dataset. That dataset is 70,000+ rows, where each row stands for a single patient visit to the hospital. There are a few relevant columns for this question such as: "subject mrn", and "protocol_no" (the study they're on).
Well, elsewhere, I have a dataset called "Data Managers" that is the staff assigned to each protocol. It has relevant columns of "subject mrn", "protocol no" and "staff name"
I have these datasets in my power bi like this:
When I connect these datasets by dragging in between them, Power BI warns me that they are many-to-many relationships. This makes sense because:
Lets say staff member John is the data manager for patient 12345 on study x
Well patient 12345 might also be on study y, and on that study, staff member steve is the data manager.
Also, other patients on study x might have other data managers.
So I need to connect these datasets in a way that when I filter to John, I only get rows back from the visits data where John is the data manager for that combination of subject AND study.
When I just drag across from protocol no and subject mrn like this
it doesnt work. The dropdown appears to filter to lists for john, but when I check for accuracy, its people with totally different data managers. Any idea what to do?
If anybody is looking at this, theres probably a way to do it with managing multiple relationships, but I ended up creating a concatenated column in each dataset of "Protocol, subject_mrn" and then linking those new columns together.
I'm starting with Power Bi and I need some help to create the data model for this practice case.
I receive the data in a table that has the power consumption of different houses by hour. In the first column it has the date and hour and the rest of columns has the power consumption of each house (one column for each house).
My idea is to create a dashboard with a map as a filter to select the house and a date slicer to filter a period of time.
I dont know exactly what is the better way to face that, if create a subtable for each house or what.
I know that I also have to create another table with the information of each house with coordinates and relate it to the information table, but this information table has the name of the houses in the header of each column and the name is followed by the text "Power Cosumption".
I would appreciate your help, I think that this is not a complicate problem and you can give me some clues or examples to face it.
Thanks!
Regards
I'm trying to build a report in a table that returns a count of completed checklists that are filtered by 2 Date Intelligence slicers. I have the targets for that month in a table, but I'm not sure how to change the measure by what's selected in the slicer.
I would like the measure to return the Target.Monthname of the selected slicers from the table Targets. The slicer is based on the DonesafeFolder Table with "Date of Completion"
Your question is a bit hard to understand.
But from your pictures, I think you should take a step back.
when using powerBI it is recommended to use a star schema, with facts and dimensions tables. Usually you model that when you import your data (with power query).
When using a star schema you also have a date dimension with the month on the rows instead of columns and powerBI is better at handling data structured like that. to lean about designing the data model you can read MS's guide to star schemas
Of cause, if you know what you are doing, you can use your approach, but it makes things alot harder.
BTW. you can't access the values in the slicers/filters directly. they filters the column you select which are then passed to the measures you create. If your datamodel is sound they should indirectly filter your measure.
I am encountering a problem in BI (data visualization over historical data) that isn't unique to any BI tool (e.g., Power BI, Qlik Sense, Tableau). I need a way to add context (e.g., descriptive text) to certain events in my company's historical data, such that we don't need to explain anomalies in data visuals to new users of a report. For example, in a visual of sales over time, we want verbiage to appear in a tooltip for certain points in time. This verbiage will be created by report users and saved (somehow). So, this is like storytelling in BI, but with the difficulty that the verbiage / context needs to remain after dataset refreshes. It would be ideal to have this be tool-agnostic, but it's fine if it needs to be tool-specific (e.g., using Power BI's comments feature).
Is there a way to achieve this?
For summary reports (that display aggregated values) you may include these descriptive texts into the report as a special measure.
For example, let's assume that your report displays sales amount per year/month/category. In terms of SQL data for this report is loaded with a query like that:
select year(date_col) as dim_year, month(date_col) as dim_month, sum(amount) as sales_amount from transactions
group by 1, 2
To display some commentary text it should be joined to the aggregation result - for example:
select dim_year, dim_month, sales_amount, c.comment as sales_amount_comment from (
select year(date_col) as dim_year, month(date_col) as dim_month, sum(amount) as sales_amount
from transactions
group by 1, 2
) res
left join comments c on (c.year=res.dim_year and c.month=res.dim_month)
(assuming that comments table has columns year, month, comment)
Next steps depend on your BI tool, in most cases you can simply display 2 measures (sales_amount and sales_amount_comment) in the pivot table, if BI tools allows custom HTML formatting this could be a one table cell that shows amount value and displays the comment on mouse over (say, wrapped with <div title='comment text'>) or smth like that.
It is up to you how users will populate comments table; it can be loaded from Google Sheets, or simple custom CRUD app may be used to add/edit records.
I have created a powerpivot model include in the image below. I am trying to include the "IncurredLoss" value and have it sliced by time. Written Premium is in the fact table and is displaying correctly. I am aiming for IncurredLoss to display in a similar fashion
I have tried the following solutions:
Add new related column: Related(LossSummary[IncurredLoss]). Result: No data
DAX Summary Measure: =CALCULATE(SUM(LossSummary[IncurredLoss])). Result: Sum of everything in LossSummary[IncurredLoss] (not time sliced)
Simply adding the Incurred Loss column to the Pivot Table panel. Result: Sum of everything in LossSummary[IncurredLoss] (not time sliced)
A few other notes:
LossKey joins LossSummary to PolicyPremiumFact
Reportdate joins PolicyPremiumFact to the Calendar.
There is 1 row in LossSummary per date and Policy. LossKey contains this information and is the PK on that table.
Any ideas, clarifications or pointers are most certainly welcome. Thank you!
The related column should work. I was able to get it to work in both Excel 2016 and Power BI Desktop. Rather than bombarding you with questions, I'll try and walk through how I would troubleshoot further, in the hopes it gets you to a solution faster:
First, check the PolicyPremiumFact table inside Power Pivot and see if the IncurredLossRelated field is blank or not. If it is consistently blank, then the related column isn't working. The primary reason the related column wouldn't work is if there's a problem with your relationships. Things I would check:
Ensure that the relationships are between the fields you think they are between (i.e. you didn't accidentally join LossKey in one table to a different field in the other table)
Ensure that the joined fields contain the same data (i.e. you didn't call a field LossKey, but in fact, it isn't the LossKey at all)
Ensure that the joined fields are the same data type in Power Pivot (this is most common with dates: e.g. joining a text field that looks like a date to an actual date field may work, but not act as expected)
If none of the above are the problem, it doesn't hurt to walk through your data for a given date in Power Pivot. E.g. filter your PolicyPremiumFact table to a specific date and look at the LossKeys. Then go the LossSummary table and filter to those LossKeys. Stepping through like this might reveal an oversight (e.g. maybe the LossKeys weren't fully loaded into your model).
If none of the above reveals anything, or if the related column is not blank inside Power Pivot, my suggestion would be to try a newer version of Excel (e.g. Excel 2016), or the most recent version of Power BI Desktop.
If the issue still occurs in the most recent version of Excel/Power BI Desktop, then there's something else going on with your data model that's impacting the RELATED calculation. If that's the case, it would be very helpful if you could mock up your file with sample data that reproduces the problem and share it.
One final suggestion I have is to consider restructuring your tables before they arrive in your data model. In your case, I'd recommend restructuring PolicyPremiumFact to include all the facts from LossSummary, rather than having a separate table joined to your primary fact table. This is what you're doing with the RELATED field to some extent, but it's cleaner to do before or as your data is imported into Power Pivot (e.g. using SQL or Power Query) rather than in DAX.
Hope some of this helps.