How to reduce the size of a pbix file - powerbi

I am going to use a simple scenario to simplify my question.
I have table A (1000 records). This table has 5 years worth of data
table B (1,000,000 records). This table has 20 years worth of data.
Table A also has a column containing the key to join to table B. The key is to the earliest created record from Table B.
I am using import mode to load this data. When i load both tables, it imports all the records from both tables. I am looking to only bring in the records from table B that join to table A. similar to INNER JOIN.
I tried using the merge funcionality and selecting INNER as join type. In theory, this should only retrieve 1000 records back but when the data is loaded in PowerBI, all records from both tables are loaded into PowerBI desktop.
I am trying to reduce the dataset size by only retrieving the relevant records from table B but not having any luck.
Does anyone have any suggestions?

Import Table A and Table B into the query editor, do the inner join to create a new Table C that only has the matching rows.
Then right-click the Table A and Table B and uncheck "Enable Load" so that those tables are only used as connections rather than being loaded into the data model and saved in the PBIX.

Related

PowerBI Data Model Not Recognized

I have a data model set up in PowerBI. I have a date table that's the fact table, and numerous data tables that are dimension tables.
Here is a picture of the data model. They are all connected Date[Date] <--> Table[Timestamp].
The table in question is the bottom table, RDS-WST00X-BZF-Insolation.POA.AM
Here is the table where the issue can be seen. It's like Measured Insolation doesn't see the Date Table, even though there is a connection set up in the Data Model
Here is the measure in which Measured Insolation is calculated from the RDS-WST00X-BZF-Insolation.POA.AM table
Finally, here are screenshots of both the Date Table and the RDS-WST00X-BZF-Insolation.POA.AM table

how to create partition and cluster on an existing table in big query?

In SQL Server , we can create index like this. How do we create the index after the table already exists? What is the syntax of create clusted index in bigquery?
CREATE INDEX abcd ON `abcd.xxx.xxx`(columnname )
In big query, we can create table like below. But how to create partition and cluster on an existing table?
CREATE TABLE rep_sales.orders_tmp PARTITION BY DATE(created_at) CLUSTER BY created_at AS SELECT * FROM rep_sales.orders
As #Sergey Geron mentioned in the comments, BigQuery doesn’t support indexes. For more information, please refer to this doc.
An existing table cannot be partitioned but you can create a new partitioned table and then load the data into it from the unpartitioned table.
As for clustering of tables, BigQuery supports changing an existing non-clustered table to a clustered table and vice versa. You can also update the set of clustered columns of a clustered table. This method of updating the clustering column set is useful for tables that use continuous streaming inserts because those tables cannot be easily swapped by other methods.
You can change the clustering specification in the following ways:
Call the tables.update or tables.patch API method.
Call the bq command-line tool's bq update command with the --clustering_fields flag.
Note: When a table is converted from non-clustered to clustered or the clustered column set is changed, automatic re-clustering only works from that time onward. For example, a non-clustered 1 PB table that is converted to a clustered table using tables.update still has 1 PB of non-clustered data. Automatic re-clustering only applies to any new data committed to the table after the update.

Can an Azure table be split on partition keys in PowerBI?

I am aware that an Azure Table has a composite key that is made up of a RowKey and PartitionKey. I am also aware that you can pull and Azure Table into PowerBI. I am new to PowerBI, so I am not sure if I am using the right term, but what I would like to be able to do is break my Azure Table into multiple tables in PowerBI based on the PartitionKey. Is this something that is possible? If so, can someone point me in the right direction?
Thanks
Import all your Azure Table data as one PowerQuery table. Then right click on the table in the PowerQuery editor and select Reference. This will give you a new table that points to the Azure Table data, call it "Partition Key Link" or "Partition Key Bridge". Remove all the row data columns. Right click on the partition key column header and select "Remove Duplicates". You now have a table of distinct Partition Keys. The go to your PowerBI model view. Create a relationship between the link table and the data from your Azure Table. You can then link your other data to the link table in order to get to a model that will work well in PowerBI.

Problems loading data in to Analysis Services Model

I’m building an model in Azure Analysis Services. The model should contain only data for the last 3 months and is processed every day.
I have a separate dimension for date that has a relation with a fact table using a datekey. I’m using a power query to only load the last 3 months in the date dimension. In the power query to load the fact table I used Table.nestedjoin to only load the rows that have a value in the date table.
When I do this, the processing of the model takes forever. After some troubleshooting I saw that the query Analysis Services is using to retrieve data from the SQL database retrieves all rows. So, Am I correct saying AS load all data before it merge the rows? Is there a way to change this? Or is there a better way to a chief my solution?
Kind regards,
Joins are super slow in Power Query. You should avoid them if you can do it in the datasource or use normal relationships in the data model.
Also, you can setup the date dimension in DAX and dynamically populate it to contain only dates present in the FACT table.
As for the load of all the data, it could be because the data is fetched as is, and only then power query applies the transformations (the join).
You can modify the query in the Power Query Editor / Advenced Editor to add a where clause direclty in the query

Azure SQL Data Warehouse CTAS statistics

Does the "Create table as" function in SQL Data Warehouse create statistics in the background, or do they have to manually be created (as I would when I do a normal "Create table" statement?)
As of the current version, you always have to create column-level statistics on tables, irrespective of whether it was created with a normal CREATE TABLE or the CTAS CREATE TABLE AS... command. It's also good practice to create stats for columns used in JOINs, WHERE clauses, GROUP BY, ORDER BY and DISTINCT clauses.
Regarding tables created with CTAS, the database engine has a correct idea of how many rows are in the table as listed in sys.partitions, but not at the column-level statistics level. For tables created by CREATE TABLE this defaults to 1,000 rows. For the example below, the first table was created with a CTAS and has 208 rows, the second table with an ordinary CREATE TABLE and INSERT from the first table and also has 208 rows, but sys.partitions believes it to have 1,000 eg
Creating any column-level statistics manually will correct this number.
In summary, always manually create statistics against important columns irrespective of how the table was created.