What happens when I filter data in Power BI data?
I am connecting to Analysis Services and loading data from a cube and then filtering it on the Year column = "2022".
What happens to previous years' data? While the historical data is not used for the report will it cause performance issues to load all the data from the source or does filtering restricts data load to only the filter criteria?
Depends where you have filtered.
If you filter the other years out in power query, you'll only get 2022 in Power BI. This may affect import time a little bit.
Power BI itself is working with subsets. If you're using a page filter for the year 2022, it creats a subset, containing only the 2022 rows. So the other years won't affect the performance. But the file will get bigger and maybe opening lasts a bit longer compared with filtering them out in power query. Advantage: On other pages you still have the full dataset including the years before 2022.
Related
I'm using a calculated column that is an average. The problem is, the average is above the range of possible values, which should be impossible. I made a calculated column that calculates the average star rating (out of a range of 1-5) and the value on a visual is coming up as 6, which shouldn't be possible, even if all the values were 5 stars, which it isn't. So there must be an outlier causing the average to be above the range of possible values, but it isn't in the original data source which Power BI pulls from. The original data source shows me a value of 4.1 as an average, which is within the expected range. But Power BI's dataset has introduced an outlier or (data is missing) that caused the average to become a 6.
I can elaborate on the dax below, but what I want to try to do is pull the dataset down from power bi to figure out why it's calculating its average that way. Looking at the source data, the average is 4.1 and there are no outliers in the source data. So, it's not the source data that's the problem. Basically, I want to find the outlier that's causing the average rating to differ in Power BI.
Avg Rating = IF(SUM(data[Total Reviews]) = 0, BLANK(), SUM(data[Monthly Stars])/SUM(data[Total Reviews]))
Here's a screencap that shows the two
relevant columns
Notice that I had to manually calculate (aka eyeball the columns and type into a calculator then calculate manually) these two columns, which came out to ~4.6. I'm trying to download this dataset to explore it in further detail without having to eyeball the dataset, as the source doesn't show this discrepancy.
To get to the data you have a number of options.
Create a new report in Power BI Desktop, and then use the connect to PBI Dataset option to access that data, in for example, a table. You can create your own report based on the dataset in the service as well.
Access that data via Analyze in Excel, which should allow you to access the data in a pivot table using Excel
Use the Export data from the visual option, using this you can download 30,000 rows into a csv, or 150,000 in to xlsx formats
Please note, that these options may not be available to you if you do not have the right permissions in the workspace, or options have been turned off in the Power BI Admin tenancy settings.
I have a PowerBI that pulls from an excel spreadsheet a current inventory of statuses of a system, lets make it easy and say I have a single measure that reads "40% complete".
If I refresh the PowerBI dataset and it now says "60%", is there any way to have a KPI automatically show +20%? Every example I've found requires you to have another dataset that keeps the historical data, and that's not really an option in this situation. Is there any way to calculate it or store it within the PowerBI query itself?
Power BI is not designed to store historical data. This is what a database is for.
In order to calculate that 20% difference, you need to store historical data somewhere but Power BI's purpose is to connect to sources and load data and then visualize it, not to act as a data repository.
I am new to Power BI and with the limited time given, I am stuck at how to come up with:
Below Table B-Row1 ("1/20" and "M"-Monday cell) - how to
specifically place the date measures in their specific cell and put
it in one column?
How can I merge the cells under the Total column?
How to add all the numbers from the Type1 and Type2 columns and place it in the merged cell in #2?
Any clues/direction/links on how to achieve the Target Table B below will be much appreciated.
PS. Below Table A. Current is just using Matrix Visualization in Power BI.
You can't exactly do what you are after. PowerBI allows you to rapidly put amazing visuals together however that comes at the price of lack of (easy) flexibility. You could build your own custom visual or look in App Source for a visual that does this, or build the Visual in some other tool (via custom code).
However, I'd recommend sticking with the PowerBI matrix, which will give you a cascading drill down and work out how best to align your data to it and other out of the box visuals. Once you start to delve in to convoluted work-arounds to give users data in exactly the format they request you start to burn a lot of time. Look for alternatives to tell the data's story and work with your end-user to buy in to it.
Just wanna share that I have resolved my problem not using one type of visualization, but through using 3 different visualizations in Power BI. I used:
1 Table visual for Date column
1 Table visual for Total column
1 Matrix visual for the Code+Type mapping and counts
I also used DAX function to get the Date format and another DAX function used for both Total and Code+Type counts(to filter data according to the specified date).
Thanks for the response, #Murray and #RADO.
I want to import a 500 GB dataset into Power BI, but Power BI is limited 1 GB. How can I get the data into Power BI?
Thanks.
For 500GB I'd definitely recommend Direct Query mode (as Joe recommends) or a live connection to a SSAS cube. In these scenarios, the data model is hosted in a separate location (such as a database server) and Power BI sends its queries to that location and displays the returned results.
However, I'll add that the 1GB limit is the limit after compression. (Meaning you can fit more than 1GB of uncompressed data into the advertised 1GB dataset limit.)
While it would be incredibly difficult to reduce a 500GB dataset to 1GB (even with compression), there are things you can do once you understand how the compression works in Power BI.
In Power BI, compression is done by columns, not rows. So a column that has 800 million rows with identical values can see significant compression. Likewise, a column with a different value in every row cannot be compressed much at all.
Therefore:
Do not import columns you do not absolutely need for analysis (particularly identity columns, GUIDs, free-form text fields, or binary data such as images)
Look at columns with a high degree of variability and see if you can also eliminate them.
Reduce the variability of a column where possible. E.g. if you only need a date & not a time, do not import the time. If you only need the whole number, do not import 7 decimal places.
Bring in less rows. If you cannot eliminate high-variability columns, then importing 1 year of data instead of 17 (for example) will also reduce the data model size.
Marco Russo & the SQLBI team have a number of good resources for further optimizing the size of a data model (SSAS tabular, Power Pivot & Power BI all use the same underlying modelling engine). For example: Optimizing Multi-Billion Row Tables in Tabular
If possible given your source data, you could use Direct Query mode. The 1 GB limit does not apply to Direct Query. There are some limitations to Direct Query mode, so check the documentation to make sure that it will meet your needs.
Some documentation can be found here.
1) make Aggregation on data on sql side __reduce size
2) import only useful column____________reduce size
I have been working on Power BI for a while now and I often get confused when I browse through help topics of it. They often refer to the functions and formulas being used as DAX functions or Power Query, but I am unable to tell the difference between these two. Please guide me.
M and DAX are two completely different languages.
M is used in Power Query (a.k.a. Get & Transform in Excel 2016) and the query tool for Power BI Desktop. Its functions and syntax are very different from Excel worksheet functions. M is a mashup query language used to query a multitude of data sources. It contains commands to transform data and can return the results of the query and transformations to either an Excel table or the Excel or Power BI data model.
More information about M can be found here and using your favourite search engine.
DAX stands for Data Analysis eXpressions. DAX is the formula language used in Power Pivot and Power BI Desktop. DAX uses functions to work on data that is stored in tables. Some DAX functions are identical to Excel worksheet functions, but DAX has many more functions to summarize, slice and dice complex data scenarios.
There are many tutorials and learning resources for DAX if you know how to use a search engine. Or start here.
In essence: First you use Power Query (M) to query data sources, clean and load data. Then you use DAX to analyze the data in Power Pivot. Finally, you build pivot tables (Excel) or data visualisations with Power BI.
M is the first step of the process, getting data into the model.
(In PowerBI,) when you right-click on a dataset and select Edit Query, you're working in M (also called Power Query). There's a tip about this in the title bar of the edit window that says Power Query Editor. (but you have to know that M and PowerQuery are essentially the same thing). Also (obviously?) when you click the get data button, this generates M code for you.
DAX is used in the report pane of PowerBI desktop, and predominantly used to aggregate (slice and dice) the data, add measures etc.
There is a lot of cross over between the two languages (eg you can add columns and merge tables in both) - Some discussion on when to choose which is here and here
Think of Power Query / M as the ETL language that will be used to format and store your physical tables in Power BI and/or Excel. Then think of DAX as the language you will use after data is queried from the source, which you will then use to calculate totals, perform analysis, and do other functions.
M (Power Query): Query-Time Transformations to shape the data while you are extracting it
DAX: In-Memory Transformations to analyze data after you've extracted it
One other thing worth mentioning re performance optimisation is that you should "prune" your datatset (remove rows / remove columns) as far "upstream" - of the data processing sequence - as possible; this means such operations are better done in Power Query than DAX; some further advice from MS here: https://learn.microsoft.com/en-us/power-bi/power-bi-reports-performance