Dynamically changing columns in Power BI - powerbi

I have an API data source I am refreshing daily to gather power bi activity. Each day, the data returns a different amount of columns, so it might have 60 one day and 80 (+20) additional another day.
When I try to refresh the dataset in the Power BI Service, it naturally fails and states that the new columns cannot be found in the row set.
I have explored many options such as creating a combine table, however I do not know all the names of the columns that could come in each day so this failed because it was very static. Does anyone know of a way to dynamically handle these daily changes?
Many thanks

The only way to refresh a data source that has changing schema is to unpivot that table and bring it into your model as key/value pairs.

It depends on what you want to do.
If you're just trying to change all extra columns from type any to type number, you can try something like
let
Source = #"table with extra columns",
OtherColumnNames = List.RemoveItems(
Table.ColumnNames(Source),
#"List of known column names"
),
#"Changed Type" = Table.TransformColumnTypes(
Source,
List.Transform(
OtherColumnNames,
each {_, type number}
)
)
in
#"Changed Type"
or, if that's something you will be doing to multiple tables you could turn it into a function, like a query with a name of "fTransformOtherColumnTypes" with the following code.
(
#"List of known column names" as list,
#"table with extra columns" as table,
Type as type
) =>
let
Source = #"table with extra columns",
OtherColumnNames = List.RemoveItems(
Table.ColumnNames(Source),
#"List of known column names"
),
#"Changed Type" = Table.TransformColumnTypes(
Source,
List.Transform(
OtherColumnNames,
each {_, Type}
)
)
in
#"Changed Type"
and then your other queries can use it, e.g. fTransformOtherColumnTypes({"name","color","org", "alias"}, #"your source data", type number)

Related

How to perform incremental refresh on table that is used as base for star schema modelling?

Incremental refresh concept is pretty straight-forward if individual tables are being incrementally loaded. Following are 2 scenarios where-in a star schema is created (explained below).
In scenario 1 - Star schema is created from 1 table.
In scenario 2 - Star schema is created from Union of 2 tables.
Is it possible to implement incremental refresh in these scenarios and how?
Scenario 1:
In transform window I have tblA is a denormalized table. Assuming I make a star schema from this table, by doing the following for each dimension: duplicating the tblA, remove unnecessary columns, remove duplicate rows, add index column.
Finally merge each dimension table with the tblA to bring in the respective ID and remove the name column(s) from the tblA.
How to implement incremental refresh in this scenario?
Scenario 2:
In Transform window, I have created queries for 2 tables (tblA and tblB)
Then I have used union to combine the both tables into new 3rd table (tblC)
From this 3rd table I have created star schema model as 4 dimension and 1 fact table.
There is large amount of data resulting in slow refresh time and load on source SQL Server db.
How to implement incremental refresh in this scenario?
You shouldn't even load tblA and tblB into the model, so you must configure incremental refresh on tblC.
To do this the Power Query for tblC must include the RangeStart and RangeEnd parameters. These parameters may also be used directly in the queries for tblA and tblB.
To derive dimensions from the same table, just reference the base table multiple times. Once with RangeStart/RangeEnd and once for each dimension.
eg, bring in FactInternetSales, and turn off Loading on that query and reference it twice. First as SalesFact:
let
Source = FactInternetSales,
#"Filtered Rows" = Table.SelectRows(Source, each [OrderDateKey] >= Date.Year(RangeStart)* 10000 + Date.Month(RangeStart)* 100 * Date.Day(RangeStart)),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [OrderDateKey] < Date.Year(RangeEnd)* 10000 + Date.Month(RangeEnd)* 100 * Date.Day(RangeEnd))
in
#"Filtered Rows1"
and once in DimProduct
let
Source = FactInternetSales,
#"Removed Other Columns" = Table.SelectColumns(Source,{"ProductKey"}),
#"Removed Duplicates" = Table.Distinct(#"Removed Other Columns"),
#"Inserted Merged Column" = Table.AddColumn(#"Removed Duplicates", "ProductName", each Text.Combine({"Product", Text.From([ProductKey], "en-US")}), type text)
in
#"Inserted Merged Column"
Then the dimensions will be fully refreshed every time and the fact will be incrementally refreshed.

In a matrix table in Power BI, how to make sure that the table doesn't calculate subtotals and totals for duplicates?

I have a matrix table in Power BI where the lowest heirarchy has 2 users with same product but for their manager, it needs to be only counted once. How can I do that in the matrix table?
When I was pulling the heirarchy from one table and sales from another, Power Bi was doing it on it's own but when sales is in the same table as the user heirarchy, it is simply taking a sum of all the sales when it should only sum once for cases when product is repeated for multiple users for the same manager.
As seen in the image, manager's total should be 300 but Power BI sums it up to 400. How can I make sure that manager's total is taken as 300? I'd really appreciate any help. Thank you
Simply put, you should remove the duplicate items related to manager "A" in the "Product" column. In the real scenario, you need to filter this way for each manager.
You can do this within Power Query:
(notice the table name 'SalesTable')
let
Source = Excel.CurrentWorkbook(){[Name="SalesTable"]}[Content],
#"Filtered Rows" = Table.SelectRows(Source, each [Manager] = "A"),
#"Changed Type" = Table.TransformColumnTypes(#"Filtered Rows",{{"Manager", type text}, {"User", type text}, {"Product", Int64.Type}, {"Sales", Int64.Type}}),
#"Duplicate Removed" = Table.Distinct(#"Changed Type", {"Product"}),
Sales = #"Duplicate Removed"[Sales],
CustomSUM = List.Sum(Sales)
in
CustomSUM

Custom grouped column and removing identical rows

I have raw data like this:
There's other columns not shown but disease site is the only one that will change within a study.
And in power Bi, ultimately in the report I want to show a table to colleagues that has just one row per study. Obviously given the unique disease sites, I need to group those first, and I need some help doing so. What I'd like to show colleagues is a table like this:
Where if there were multiple disease sites associated with a "study" then they are clumped as "multi". I figure to do so it'll mean creating a custom disease site column with 'multi' in it and then filter to one row per study, but I'm having trouble with the details.
Do I do that in power query? Should I do it in power bi after the query is imported? Any help would be appreciated, thank you!
Load your data into Powerquery or similar
Click select the Study and Primary_Investigatory columns, right click, group by and choose operation All Rows
Change the ending of the group in the formula window (or in Home .. advanced editor) from
{"Primary_Investigatory", "Study"}, {{"data", each _, ... })
to
{"Primary_Investigatory", "Study"}, {{"data", each if Table.RowCount(_) = 1 then [Disease_Site]{0} else "Multi"}})
sample full code for example image:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Primary_Investigatory", type text}, {"Study", type text}, {"Disease_Site", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Primary_Investigatory", "Study"}, {{"data", each if Table.RowCount(_) = 1 then [Disease_Site]{0} else "Multi"}})
in #"Grouped Rows"

Vlookup in M language and sum values

In trying to make a Vlookup on PowerQuery that also makes a sum of the multiple values fond. I have 2 tables on my Power BI that are conected by the Report Number as showed below. I need to create a new column on table B that gets the sum of cost at Table A according to their report numbers.
At Power Query I have created a new Column on Table B using the following code:
After that I was planning to simply create a new column summing the list result, but my list is Empty and I can't realize why. Can anyone help me understand why I can't get the results?
I can't do this using DAX, it should be in M
One way to add the column into TableB is:
= (i)=>List.Sum(Table.SelectRows(TableA, each [Report Num]=i[Report Num]) [Cost])
Another way is to Group TableA and merge it in. I tend to think this is a faster method for larger tables
let Source = Excel.CurrentWorkbook(){[Name="TableB"]}[Content],
#"Grouped Rows" = Table.Group(TableA, {"Report Num"}, {{"Cost", each List.Sum([Cost]), type number}}),
#"Merged Queries" = Table.NestedJoin(Source, {"Report Num"}, #"Grouped Rows", {"Report Num"}, "Table1", JoinKind.LeftOuter),
#"Expanded Table1" = Table.ExpandTableColumn(#"Merged Queries", "Table1", {"Cost"}, {"Cost"})
in #"Expanded Table1"
of course, if those are the only two columns in TableB, you could just create the whole table in one go
let Source = Table.Group(TableA, {"Report Num"}, {{"Cost", each List.Sum([Cost]), type number}})
in Source

Make Calculated Table of Stops that looks at both Pickup and Delivery Stop on an Order

I do Power BI for a logistics company. We want to show performance by stop location. The data is currently a table of all orders by Order ID, so -- ID, Rev $, Pickup Stop, Delivery Stop. Everything is a 2-stop load, fortunately.
What I am struggling with is building a calculated table that looks at the Pickup Stop AND the Delivery Stop at the same time while ALSO respecting filters set on the page. I would like the stops table to say something like: Stop Location, X Pickups, $X Pickup Revenue, X Deliveries, $X Delivery Revenue.
How would I go about this? I've tried a number of approaches but every time it either misses filters or can only handle one stop at a time.
Thanks!
Current Datacall it Orders
The calculated table I'm trying to makecall it Stops
One method of creating your Stops, given your Orders is by using Power Query, accessed via Queries=>Transform Data on the Power BI Home Tab.
The Table.Group function is where the magic happens. Unfortunately, it needs to be done by coding in the Advanced Editor, as the UI does not provide for these custom aggregations.
When the PQ Editor opens: Home => Advanced Editor
The first three lines should be replaced by whatever you are reading in your own Orders table with.
Paste the rest of M Code below in place of what is below your setup lines in your own query
Read the comments and explore the Applied Steps to understand the algorithm
M Code
let
//Input data and set datatypes
//These lines should be replaced with whatever you need to
//set up your data table
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("bYzBCoMwEER/Zck5BxPRu1RaLJaW6qEQPIS4tEExoonQv+/a0oLQyw5vZnaUYuepxQmKnHFWO697uOKCQ0DiizVdGKHybiTKsbcLTs8PN1wxIZMooiR938z3evCawyFbKczeDhzq268qyBZpsg23f9+qJF+Skuwe1ui741CU/2djsmO53lJ3SFsth/3aPWrTzY7Kp4o1zQs=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
dataSource = Table.TransformColumnTypes(#"Promoted Headers",{
{"Order ID", Int64.Type}, {"Total Revenue", Int64.Type},
{"Pickup Stop", type text}, {"Delivery Stop", type text}}),
//Unpivot to get single column of Stops
#"Unpivoted Columns" = Table.UnpivotOtherColumns(dataSource, {"Order ID", "Total Revenue"}, "Attribute", "Stop"),
//Group by stop and do the aggregations
#"Grouped Rows" = Table.Group(#"Unpivoted Columns", {"Stop"}, {
{"Orders Picked Up", (t)=> List.Count(List.Select(t[Attribute], each _ = "Pickup Stop" )), Int64.Type},
{"Total Revenue Picked Up", (t)=> List.Sum(Table.SelectRows(t, each [Attribute]="Pickup Stop")[Total Revenue]), type number},
{"Orders Delivered", (t)=> List.Count(List.Select(t[Attribute], each _ = "Delivery Stop" )), Int64.Type},
{"Total Revenue Delivered", (t)=> List.Sum(Table.SelectRows(t, each [Attribute]="Delivery Stop")[Total Revenue]), type number}
})
in
#"Grouped Rows"
Orders
Stops