What is the difference between hide and unselecting 'enable load'? Both options result in the table/field not showing up on the report.
It's not about the report. It's also about efficiency and storage.
There are situations when queries serve as intermediary data sources, which are only used for joins into other tables and after that are not required for any visuals. In that case, you don't need to load the table into the data model at all and the data in that table will not use up space in the data model.
This helps keep the data model cleaner and the PBIX file smaller. Hiding tables from the report does not do that.
Edit after comment: Here is a scenario where I would disable data load of a query: Say I have a customer table with customer ID, name, region and a hundred other columns. It has 100,000 customers, most of them are historic, only 10,000 are active. In my Transaction table I have only the customer ID, but I want to report by the region.
I can load the customer table into Power Query, then join it to the Transaction table and keep only the region in the Transaction table. I don't need any other details from the customer table because all my reporting only drills down to region level, not individual customer. Also my report is focused on a segment of my business that involves only a fraction of my existing customers. Therefore, I don't need the whole customer table in the data model. I save the Customer query as a connection only, and don't load it into the data model to save the space.
Yes, I could load it and I could use a relationships instead of the join, but by loading just the region field for a subset of customers, I can keep the model much leaner.
Related
I have a sql database linked where I have the complete history of products and users. I want to the user to be able to select on the slicer a year and the data automatically shows active prodcuts, expired products and new products added in that year (or snapshot).
Is there a way this can be done? I am not able to find a measure to best do this for me.
I recommend creating a date dimension table first - I usually call mine Calendar. Please read this useful post by Radacad which will show you how to create one > https://radacad.com/power-bi-date-or-calendar-table-best-method-dax-or-power-query
Once it's done create relationships between your fact tables and calendar table on key dates of when your products are active or expired - I'm making a huge assumption that's what your tables store.
Your calendar table will then act as a single time/date point of truth and should be used to slice and dice your fact table.
Hope this helps!
Connection Type: Direct Query to multiple sources so limited DAX available especially in Power Query load.
Data Model Query: The Data model is not a perfect star schema but there is an attempt to separate tables into business processes and lookup tables. There are probably a few issues to discuss the current data model. I only have 1 question at this time.
My current goal is to generate a single summarised customer table to replace the current two tables that have some measure I need like the number of app customers, a number of total customers, date customer first accessed app etc.
So I cannot merge the 2 customer tables and add calculated columns and measures at the import stage as power bi does not support or allow it and sql is out as I am using direct query. My plan is to create a summarised customer table using DAX summarise function on front end visual page, that has only the app customers and then measures like the total number of customers etc. Is this best practice or is there a better way of approaching this? Understand you would ideally do in sql, or power query but in these circumstances, I think this is the best way but wanted a second view.
Is there a reason to use Direct Query over Import? If you are in Import mode, you can easily Append the two client tables together in PowerQuery.
Treb Gatte, Power BI MVP
I’m building an model in Azure Analysis Services. The model should contain only data for the last 3 months and is processed every day.
I have a separate dimension for date that has a relation with a fact table using a datekey. I’m using a power query to only load the last 3 months in the date dimension. In the power query to load the fact table I used Table.nestedjoin to only load the rows that have a value in the date table.
When I do this, the processing of the model takes forever. After some troubleshooting I saw that the query Analysis Services is using to retrieve data from the SQL database retrieves all rows. So, Am I correct saying AS load all data before it merge the rows? Is there a way to change this? Or is there a better way to a chief my solution?
Kind regards,
Joins are super slow in Power Query. You should avoid them if you can do it in the datasource or use normal relationships in the data model.
Also, you can setup the date dimension in DAX and dynamically populate it to contain only dates present in the FACT table.
As for the load of all the data, it could be because the data is fetched as is, and only then power query applies the transformations (the join).
You can modify the query in the Power Query Editor / Advenced Editor to add a where clause direclty in the query
Google BigQuery (BQ) allows you to create a partition using timestamp or date types only.
99% of my data has a very clear selector, idClient. I've created to my customer's views with a predicate like idClient = code so the privacy is guaranteed.
The problem with this strategy is that there are customers with 5M rows and others with 200K and as BQ does not have indexes, they are always processing data from each other (and the costs are rising).
I am intending to create a timestamp field where each customer will have a different timestamp that will be repeated for every Insert in every customer sensitive table and thus I can query by timestamp by fixing it as it would be with a standard ID.
Does this make any sense? If BQ was an indexed database I'd be concerned about skewed data but as it is always full table scan, I think I'd have only benefits and no downsides.
The solution for your problem is to add Cluster field to your table which is equivalent to an Index in other databases
This link provides the basic on how to use cluster field
Clustering can improve the performance of certain types of queries such as queries that use filter clauses and queries that aggregate data. When data is written to a clustered table by a query job or a load job, BigQuery sorts the data using the values in the clustering columns
Note: When using cluster field BigQuert dryRun doesn't show the cost improvement which can only be seen post-execution
I have multiple databases which have similar schemas. I need to combine the data from all these databases and do reporting over it.
For example -
Customer table in AdventureWorks in Server 1
Customer table in AdventureWorks in Server 2
Customer table in AdventureWorks in Server 3
Now in Power BI i will have a data set called Customer. The data for this needs to come from all the 3 servers mentioned above. I know I can do it using merge queries in Power BI but it means I will have to pull the data from different server as different datasets in power bi and merge which I want to avoid.
Do let me know if there is any other way to do this.
You must use Edit queries -> Append queries (or Append queries as new) to combine the data from these data sources into one table:
I find Append queries as new more meaningful in this case. It will create a new (fourth) table, which will contain all the rows from the other 3 tables. Select Three or more tables and select all customers tables from your data sources:
Then use this new table in the report.
You should not worry that duplicating the data will lead to increased size of the data. The data itself will be reused and not copied twice.
Your model is the right place for combining data from different data sources in one place. Maybe this should not be your report, but one dataset shared between your reports, or a SSAS model. Combining the data on SQL Server level (e.g. partitioned view over 3 databases) is not a good idea. Combining these in your model also gives you the option to combine data from different types of data sources, e.g. Customers 1 is in SQL Server, Customers 2 is .csv file in OneDrive and Customers 3 is coming from a web service call.