I wanted to know what would be the best approach for creating the dim tables. Can I maintain it as a single table with all fields and use them as required or create separate dim tables and use them individually.
Can someone please help me out here
PS: I'm a beginner here.
Creating 1 table per dimension is the best practice. In data warehouse concept, you will get 4 types of schema as below-
Start Schema
Snowflakes Schema
Galaxy Schema
Combined Schema
People select any of the above based on their Data type/nature, requirement and other parameter. But in all case, there are single table per dimension. This is easy to maintain and give better performance.
Related
I need help on this issue as i don't have any experience in Power Bi. I want to join 2 table in Power Bi where it have the same column which is Part_Number. How can i make this 2 table to match by Part Number and return the value?
Recon Table
Inventory Table
I would like to have Part Number, Part Name, QTY, Total Quantity as the result. Hope that i can the clarification i need. Thanks a lot!
For this case you simply must merge the tables. It doesn't look like you have done a lot of research on the matter though, so it's hard to understand exactly what you need help with.
To merge your two tables in Power Query, I would right click in the left hand side menu and select Merge Queries as New.
After that you simply follow the on-screen instructions and select your two tables and their respective key columns. After merging you can choose to disable load of your two original tables to save space in your data model, but this depends on your requirements.
If this was my data model, I would think on why joining these tables are necessary, instead of using these two tables as fact tables, and creating a third table to handle the part number dimension with associated part metadata.
Read the docs: Merge queries in Power Query
We're trialling PowerBI on a Snowflake dimensional model and performance seems very non-optimised. Can anyone point me to information on best practices for this connection? I've previously used Tableau and there's an excellent white paper describing the pros/cons of each connection type and how to set this up so that as much heavy lifting as possible is done in Snowflake, with minimal load on the viz tool.
e.g. when you summarise 1 million invoices to get a chart of sales volume by year that distils this to 10 data points, Tableu would send 'SELECT year, sum(volume) FROM t GROUP BY year' (~10 rows), but in PowerBI we see SF receiving a query like 'SELECT invoice_id, sum(volume) FROM t GROUP BY invoice_id' (~1M rows) - leaving the viz tool to do a lot more work.
So far, we've tried mapping the individual facts and dimensions within PowerBI, and also using a mix of direct query and import, but without significant improvement. Is there any guidance on best practice?
Thanks in advance!
I've never used Snowflake, and I have no clue about how PowerBi interfaces to it. That said on the PowerBI side you may be interested in the composite model and aggregations.
MS Docs:
https://learn.microsoft.com/en-us/power-bi/desktop-composite-models
https://learn.microsoft.com/en-us/power-bi/desktop-storage-mode
https://learn.microsoft.com/en-us/power-bi/desktop-aggregations
Radacad's blog about aggregations:
https://radacad.com/power-bi-fast-and-furious-with-aggregations
https://radacad.com/dual-storage-mode-the-most-important-configuration-for-aggregations-step-2-power-bi-aggregations
In practice, when you are using a composite model the aggregation functionality allows you to create a hidden table (in import mode) in your model with aggregated data (by year, month, customer, etc).
Now when you query your data, PowerBI will check if this table can answer the query, if yes then it will just pick the data from this table, otherwise, it will run a query against the source (direct query)
The example you shared about PowerBI querying the source without asking for aggregation (but instead asking for every single InvoiceId) might be caused by not setting up the composite model correctly.
A table in "direct query" cannot reference other tables in its query (in this case the calendar) unless that table is also in "Direct query" or "dual" mode.
How does the model look like in the case you shared? and which is the storage mode of each table?
Hello M language masters!
I have a question about working with grouped rows when the Power Query creates a table with data. But maybe it is better to start from the beginning.
Important information! I will be asking for example only about adding an index. I know that there are different possibilities to reach such a result. But for this question, I need an answer about the possibility to work on tables. I want to use this answer in different actions (e.g table sorting, adding columns in group table).
In my sample data source, I have a list of fake transactions. I want to add an index for each Salesman, to count operations for each of them.
Sample Data
So I just added this file as a data source in Power BI. In Power query, I have grouped rows according to name. This step created form me column contained a table for each Salesman, which stores all his or her operations.
Grouping result
And now, I want to add an index column in each table. I know, that this is possible by adding a new column to the main table, which will be store a new table with added index:
Custom column function
And in each table, I have Indexed. That is good. But I have an extra column now (one with the table without index, and one with a table with index).
Result - a little bit messy
So I want to ask if there is any possibility to add such an index directly to the table in column Operations, without creating the additional column. My approach seems to be a bit messy and I want to find something cleaner. Does anyone know a smart solution for that?
Thank you in advance.
Artur
Sure, you may do it inside Table.Group function:
= Table.Group(Source, {"Salesman"}, {"Operations", each Table.AddIndexColumn(_, "i", 1, 1)})
P.S. To add existing index column to nested table use this code:
= Table.ReplaceValue(PreviousStep,each [index],0,(a,b,c)=>Table.AddColumn(a,"index", each b),{"Operations"})
I have a table of dates (every date from 2003 to 2035) in my data model but am wondering if I need to create relationship(s) between this and my other data tables, wondered if anyone could please share best practice?
If so, my main table has several columns of dates so which would I link to?
To be honest, I am thinking I shouldn't create a relationship as any filtering of the date table will then only filter my model by which date column I have a relationship with?
I hope all that makes sense. It's more of an abstract question at the moment but my ultimate goal is to create some kind of rolling average.
Thanks in advance.
The best practise is clearly to create a relationship between your date table and your data table (fact table I assume). But you have to choose the most relevant column to make your link, knowing that it's preferred to not make multiple relationships between the same tables.
If you have a "snapshot date" column, you could make the link with this one to see the status for that period for example. It really is up to you.
If the filtering is annoying to you, you can always disable it on the visuals.
I hope it helps.
Im in the process of learning to properly pull appropriate metadata from a Teradata database and a large part of what I need is to pull all existing primary/foreign keys within a database. I am still very much a beginner with Teradata as well as big data in general, so a simplified explanation would be nice.
A simplified version of a select statement would also be incredibly helpful. Thanks in advance.
Foreign Keys: dbc.All_RI_ParentsV[X]
PK/Unique: dbc.IndicesV[X]. Unique Indexes got a UniqueFlag Y, if it was defined as a PK in the Create Table IndexType will be P. Multi-column indexes got one row per column all sharing the same IndexNumber, 1 is always the PI.
But as Teradata is a DWH you might have tables without defined PK and you will hardly find any defined FKs.