Clean up table in Power BI - powerbi

I am attempting to load several Excel files into Power BI. These files are pretty small (<= ~1k rows). One of these sources must be cleaned up. In particular, one of its columns has some bad data. The correct data is stored in another Excel file. For example:
table bad:
ID col1
1 0
2 0.5
3 2
4 -3
table correct:
ID colx
2 1
4 5
desired output:
ID col1
1 0
2 1
3 2
4 5
In SQL or other data visualization tools, I would left join the bad table to the clean table and then coalesce the bad values and correct values. I know that I have some options on how to implement this in Power BI. I think one option is to implement it in the query editor (i.e., M). I think another option is to implement it in the data model (i.e., DAX). Which option is best? And, what would the implementation look like (e.g., if M, then what does the query look like)?

While you can do this in DAX, I'd suggest doing it in the query editor. The steps would look roughly like this:
Merge the Correct table into the Bad table using a left outer join on the ID columns.
Expand out the Correct table to just get the Colx column.
Create a custom column to pick the values you want. (Add Column > Custom Column)
if [Colx] = null then [Col1] else [Colx]
You can remove the Col1 and Colx column if you want or just keep them. If you delete Col1, you can rename the Col2 column to be Col1.
If you don't want the source tables floating around, you can do all of the above in a single query similar to this:
let
BadSource = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTJQitWJVjICsfRMwWxjINsIzDIBsnSNlWJjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Col1 = _t]),
CorrectSource = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlLSUTJUitWJVjIBskyVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Colx = _t]),
Bad = Table.TransformColumnTypes(BadSource,{{"ID", Int64.Type}, {"Col1", type number}}),
Correct = Table.TransformColumnTypes(CorrectSource,{{"ID", Int64.Type}, {"Colx", type number}}),
#"Merged Queries" = Table.NestedJoin(Bad,{"ID"},Correct,{"ID"},"Correct",JoinKind.LeftOuter),
#"Expanded Correct" = Table.ExpandTableColumn(#"Merged Queries", "Correct", {"Colx"}, {"Colx"}),
#"Added Custom" = Table.AddColumn(#"Expanded Correct", "Col2", each if [Colx] = null then [Col1] else [Colx]),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Col1", "Colx"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Col2", "Col1"}})
in
#"Renamed Columns"

Related

Power BI - power query editor - combine several rows in a table

Given this sample raw table (there are some more columns..):
agg_group
count_%
CHARGED_OFF
1.2
DELINQUENT
1.8
ELIGIBLE
90
MERCHANT_DELINQ
7
NOT_VERIFIED
0
How can I transform this table, to create 2 new columns, using either DAX in Power BI Desktop, or in Power Query?
Desired result:
agg_group
outstanding_principal
ELIGIBLE
90
DELINQUENT
10
Here is a Power Query sample that does this, paste the following into a blank query to see the steps, based on your sample data:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("JcsxDoAgDIXhu3QmRl3UUaVAEyyRoAsh3P8WNnX93v9qhTPs2aPtyTkwMA0zNFPBYiS+H+SiuCqKeToiCm2jyoVZ/lz638uwqHMq/cVMjtAKStw+", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [agg_group = _t, #"count_%" = _t]),
#"Cleaned Text" = Table.TransformColumns(Source,{{"count_%", each Number.FromText(_, "en-US"), type text}}),
#"Added Conditional Column" = Table.AddColumn(#"Cleaned Text", "agg_group_", each if [#"agg_group"] = "ELIGIBLE" then "ELIGIBLE" else "DELINQUENT"),
#"Grouped Rows" = Table.Group(#"Added Conditional Column", {"agg_group_"}, {{"outstanding_principal", each List.Sum([#"count_%"]), type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Grouped Rows",{{"agg_group_", "agg_group"}})
in
#"Renamed Columns"
Result:

Power Query - Add custom column based on values found in another lookup Query

I like to add a column in MainQuery and fill it up with values found in another LookupQuery based what it being able to find the matching unique value embedded in the text of column1 in my Main Query. if not found just return blank, null. Thank you
Main Query
Column1
...111.
..ABC..
..34C..
...xyz.
yyy.....
Lookup Query
Uniques
34C
ABC
111
Desired Output in Main Query
Column1
New Column
...111..
111
..ABC...
ABC
..34C...
34C
....xyz.
yyy.....
Here's another method using List.Accumulate to loop through the lookup list. Not sure which of the methods will be more efficient.
Note: If there might be more than one lookup result for a given item in column 1, a small change in the List.Accumulate function can accommodate showing them all with a separator
let
//Read in Main Query
Source = Excel.CurrentWorkbook(){[Name="Main"]}[Content],
Main = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
//Read in Lookup Query as a list of text items
Lookup = List.Buffer(
List.Transform(Excel.CurrentWorkbook(){[Name="Lookup"]}[Content][Uniques],
each Text.From(_))),
//add Column
#"Add Column" =
Table.AddColumn(
Main, "New Column", each
List.Accumulate(Lookup,
null,
(state, current)=>
if Text.Contains([Column1], current)
then current else state),
type nullable text)
in
#"Add Column"
Results
There's a not a built-in join that does this, but you can cross-join followed by a filter. And to Cross Join in PQ you add a custom column to one table containing the full value of the other table, then run Table.ExpandTableColumn.
EG
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSjE0NFSK1YlWKk5Jc3RyNrY0BvMSs4uzgDLZxYlZKWnZxibOSrGxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Lookup),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Uniques"}, {"Uniques"}),
#"Selected Rows" = Table.SelectRows(#"Expanded Custom", each Text.Contains([Column1], [Uniques]))
in
#"Selected Rows"

Power BI | Power Query: how to create multiple columns from one single value column

I am currently using Power BI (Power Query) to clean up a dataset, and I am having some problems unpivotting a table the right way (see image below). Any suggestions on how to sort this out?
For unpivoting to work you need three columns:
Key
Column
Value
1
A
1.3
1
B
3
1
C
New
2
A
2.3
2
B
3
2
C
Old
So this could be unpivoted to:
Key
Column A
Column B
Column C
1
1.3
3
New
2
2.3
3
Old
So for this you would need to add a column-column where the row value is the column name - prior to unpivoting the table.
Edit:
If your rows are ordered according to columns per key, you can do something like this, where you create your own column-column prior to pivoting. See code you can paste into a blank query:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Tcq7DQAgCAXAXV5tI0zAZwvC/muYaCSvu+KqsLFgZuj17O7jiBhn5rXQF/pCX+grfaWv9PX/Pg==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [id = _t, values = _t]),
#"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each Number.Mod([Index],4)),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"}),
#"Pivoted Column1" = Table.Pivot(Table.TransformColumnTypes(#"Removed Columns", {{"Custom", type text}}, "nb-NO"), List.Distinct(Table.TransformColumnTypes(#"Removed Columns", {{"Custom", type text}}, "nb-NO")[Custom]), "Custom", "values")
in
#"Pivoted Column1"

Duplicate Rows in Power BI multiple times by a number indicated in another column

The idea of the problem I am having is that in Power BI I have a table like:
col1 col2
entry1 1
entry2 2
entry3 1
I would like to create a table of the form:
col1
entry1
entry2
entry2
entry3
That is you duplicate each row by the number specified in a different column. In my actual case my table has many other columns whose values should also be duplicated in each row.
I would like to be able to do this using power queries.
Thanks
You can add a custom column to your table with formula
List.Repeat( { [col1] }, [col2] )
This produces a column with a list in each row where the elements of the list are [col1] listed [col2] number of times.
From there, you can expand that list into rows using the button on the table.
Here's what the full M code looks like:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSs0rKao0VNJRMlSK1YFyjYBcIwTXGCIbCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [col1 = _t, col2 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"col1", type text}, {"col2", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each List.Repeat({[col1]},[col2])),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom")
in
#"Expanded Custom"
From here, you can pick either col1 or Custom and delete the other columns if you choose.

Power BI - Stack chart sort by

I have the below chart
that is sorted by the count for each of the bins. I try to sort it by days open starting from 0 - 5 days, 5 - 10 days etc...
I have added another table that has IDs for each of the bins (0 - 5 days is 1, 5 - 10 days is 2) but I am unable to use it to sort it.
Any ideas?
I do it always by adding dimension table for sorting purpose. Dim table for bins would look like this:
Then go to Data pane and set it up as shown in the picture below.
Select Bin name column
Choose Modeling from menu
Sort by column and here choose column Bin order
Then connect the Dim table to fact table:
While making visual choose Bin name from Dim Table not Fact Table!
Then the final thing is set up sorting in visual:
Here you have Dim and Fact table to reproduce exercise.
Dim Table:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcsrMUzDQNVXSUTJUitWB8E11DQ2AAkZwAUMDXSOQiDFcxAioByRigtBkoGsIVmSK0GZkoA0UMFOKjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Bin name" = _t, #"Bin order" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Bin name", type text}, {"Bin order", Int64.Type}})
in
#"Changed Type"
Fact Table:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jdAxDoAgDAXQq5iu0qQtVmX1GoQDuHj/0SoJCWVh5KefR8kZrvtZCBUCMJRQz4pMFkgLmFC+ZGuJWKefUUL+h6zbekJrd3OV1EvHhBSnJHLU6ak0QaWRil5SBw2/pwPEMzvtHrLnlRc=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Bin name" = _t, Frequency = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Bin name", type text}, {"Frequency", Int64.Type}})
in
#"Changed Type"
You should be able to do a Sort by Column under the Modeling tab where you sort your bin name column by the ID value column.
You need to :
Connect the bins (0-5),(5-10) columns from the two tables in your relationships.
In your second table, add a column called order: 1,2,3 for the bins (0-5), 5-10 respectively and so on
This should work