Incremental loading for fact table in informatica - informatica

I am performing SCD type 1 mapping. I have these source tables.
1.CUSTOMERS, (Source customer File)
2.PRODUCTS (Source Product File)
3.SALES (source Sales file)
Also, these are dimensions and fact tables.
1.D_CUSTOMERS (SCD 1),
2.D_PRODUCTS (SCD 1) and
3.F_SALES_DLY (Fact)
Sales contains daily level data. Fact table will have sales & quantity data aggregated for product & for a month.(Fact table should be incremental for 24 months.(It should always contain 24 months of data)
I am bit confused, How to make fact table incremental? If i try to use any transformation(filter) to get last 24 months records, is it feasible solution?
Please suggest me something..
Thanks...

Related

What is best practice to deal with missing data according to Kimball?

I have a data base with the following tables:
Customers, Invoices, Salesman, Target.
The ones concerned about my question are Customers, Invoices.
There are customersIDs used in the Invoices but doesn't exist in the Customers table.
If I used only the customers from Customers Table, my customer dimension would be incomplete.
My solution is to append these IDs from Invoices to Customers and fill other columns in the Customers table with nulls.
I don't know if this is the best approche according to Kimball?
also, if it is a good solution, how can I add accomplish it with Power bi desktop?
Customers table: "generated Data"
Invoice table:
..... just a sample the table is thousands of rows.
There's two points here:
Firstly, (in import mode at least) PBI already creates the "blank row" for items present in your fact table but missing from your dimension table for precisely this scenario. If you don't need the granularity of each individual missing customer id, then you don't need to do anything.
Secondly, if you need to to retain that granularity then your approach is the correct one. The way to do this in Power Query is as follows:
Create a new query which takes your customer dimension table and does a left outer join on customer id with your invoice fact table.
Expand the newly joined table but retain only the new customer id column.
Remove all columns apart from the new customer id column.
Remove duplicates
You now have a list of missing customer ids. Ensure the column name is the same as the column name of you customer id in the customer dimension table. Append this to the original customer dimension query and the nulls will be filled in automatically for the missing columns.
Please keep in mind that It is Kimball, not Kimble.
There are 4 steps of DWH Methodology:
1) Understand Business Process (What your process is actually measuring?)
2) Deciding the grain (It means what every row in your fact table actually represents?)
3) Deciding Dimensions (Ask Where-What-Who-Where-How-HowMany-HowMuch to your grain declaration formed together with business processing)
4) Define Facts (Metrics)
According to this order, You define Dimension tables before building your fact tables: If your dimension table , Customer table in this case, is missing in terms of customers available in your fact table, My biggest biggest advice according to the DWH Dimensional Modeling is to set your customer table right!!! Define every piece of customer in your dimension table!!!! Then populate your Fact table with records:
[Customer ID] in Customer Table : PRIMARY KEY
[CustomerID] in Invoice Table : FOREIGN KEY
SQL and Power BI reacts very differently in your problem:
1) Power BI has no referential integrity concept: It adds a blank row to your dimension table in such a case.
2) SQL gives referential integrity error, and you can't even add rows to your fact table. I support SQL in this case personally!!!!
Finally: Use some ETL tool(SSIS, Talend, ODI or even Power Query) to make your dimension table as accurate as possible:
For example:
Do not leave any column value as null!
If an unknown date exists, put a default date value like '1900-12-31'
If an unknown textual property, put in keywords 'unknown','not available' etc..
Because dimensional table are sources of querying in SQL statements; and different SQL Vendors (SQL Server, Oracle, MySQL) has to deal with NULL values in a different way, and this cause problems in terms of performance wise!!

Add missing month rows based on multiple columns with multiple variables in Power BI

I'm very stuck and I was hoping you can help. So have the following dataset (Table 1) with Month (5 years worth), Customer (1000 customers), Product (100 products), Units and Value (value is just unit multiple a price). The data only shows rows with unit and value, so for customers when there is no sale in a month, there is no data.
Click here for Table 1
I want to create a table (Table 2) where every product for every customer is shown for all time periods, where actual units and values are included and those missing in Table 1 are now showing 0.
Click here for Table 2
I have read many posts here and elsewhere, which only handles 1 column (e.g. only Customer not both Customer AND Product, and only 1 measure not Unit and Value). I tried to adapt the code but failed miserably.
I also want to do this in Power BI using M not DAX, because I would like to further transform the data.
Thank you so much everyone!!!
Good afternoon.
You can use DAX to create a calendar for the required period of time.
Use the minimum and maximum values from Table 1 for the interval (Calendar function).
Calendar = CALENDAR(Date(2022,5,1), TODAY() -1)
Link the calendar to the necessary dates.
calendar link example
In the settings of the table to which you will output data, select the "Show items without data" setting, and take the date from the calendar.

How to build relationship based on range of dates (periods) in Power BI

I'm trying to figure out the best way to build a relationship from a table that has records in a daily format (one record represents a single date) to a table that contains records in a date-range format (one record has a start-date and end-date, consequentially representing a period or range of dates).
Since my actual datafiles contain work-related information, I created 2 demo tables that contain dummy data that reflects the date columns in question.
Here is my DailyDate table
Here is my DateRanges table
Here is the current model view
I would like to be able to have a relationship built between the tables so that if I were to have 2 tables/matrices in the Report view, with one table showing the Daily Date Data and the other table showing the Ranged Data Data, I would be able to select a record in the Daily table and Power BI's highlight functionality would filter the records in the Ranged table so that only date ranges containing my selected date appear, and vice-versa (if possible).
For example, referencing this screenshot, if I were to select index 0 in the 'Daily Data Data' table, the 'Ranged Date Data' table should be filtered to only show records with index of 0 in the other table. If I were to select Index 2 (01/03/2022) in the Daily Date Data table then the Ranged Date Data table should be filtered to only show indices (0, 1).
In the model view, when trying to build this relationship, I can create a relationship from DailyDates.Date to DateRanges.StartDate and then from DailyDates.Date to DateRanges.EndDate; however, only a single relationship can be active so the highlight and slicer functionality will not give me the results I'm looking for.
As you can see from this demo, the datasets are small; however, my actual datasets contain around 50 million records in the Daily table and 10+ million records in teh Ranged table, so I'm hoping there can be an efficient method of getting this functionality that will not be too much of a load on memory.
Any advice into how to accomplish this would be greatly appreciated.

Multiple IDs to one fact table

I have the following example where I have in the Code mapping table two IDs which one relates to the current year ID and the second to the prior year ID. It is important to map them against eachother so that we can do calculations such as current year net revenue vs. prior year net revenue.
I am struggling to figure out the best way to structure the data model vs. how much to takle with DAX. Any ideas on the best way to model it?
As I understand your query, the current year & prior year are distinct dimensions so your model should have 2 dimension tables that are linked separately to the fact table, rather than trying to link the fact table to the same dimension table twice. The source ClientcodeJobcodes table would be the same for each of the 2 dimensions but would appear twice in your data model (e.g. ClientcodeJobcodesCurrent & ClientcodeJobcodesPrior).

DirectQuery - Very inefficient queries being generated to Snowflake

I am connecting PBI to Snowflake using DirectQuery. To keep it simple, I have two tables, a product dimension table and a sales fact table. There are 3.7M rows in the product dimension table and 100M in the sales fact table. I also have a measure that calculates total sales which uses SUM to sum a column in the fact table.
I create a table visual in PBI and put the product description as the first column. The query generated by PBI is good. It retrieves 501 rows and displays them. So far, so good. Next I put the total sales measure as the second column. Now PBI generates several queries retrieving 1,000,001 rows. Of course I get an error stating the 1M row limit for DirectQuery has been reached.
This should not be happening. Has anyone run into something like this? Is there anything I can do?
I had a dig around and there is a capability to adjust the limit if you have a premium license
https://powerbi.microsoft.com/en-gb/blog/five-new-power-bi-premium-capacity-settings-is-available-on-the-portal-preloaded-with-default-values-admin-can-review-and-override-the-defaults-with-their-preference-to-better-fence-their-capacity/