Convert categorical variables to numeric PowerQuery

Convert categorical variables to numeric PowerQuery - powerbi

I have a number of columns in my table which have text values that fall into categories - e.g. column "ABC" has 9000 rows but every row must have a value in the set {"A","B","C"}. Other columns like Gender have "M"/"F"/null
For each column, I'd like to convert it into an integer list in-place - so A:1, B:2, C:3 etc.
I've been trying out using List.Distinct to extract the values to a temp table, adding an index column to that and using a join to transform the initial column based on that mapping in the temp table. However this seems slow and I'm not sure how to run this over all columns in my table (or at least Table.ColumnsOfType(Source, {type nullable text}) to select the categorical columns...).
Any suggestions?
Before
Gender
Fruit
[...]
F
Cat
F
Dog
M
Lemon
M
Dog
M
Lemon
null
Cat
M
Dog
After
Gender
Fruit
[...]
1
1
1
2
2
3
2
2
2
3
null
1
2
2

In PowerQuery, this seems to work for any number of columns
Replace all nulls with something else, here +=+
Add Index
Unpivot
Remove duplicates
Group, add index to each group
Merge back into original and expand
Repivot
Remove extra columns
Before and After:
Full code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Replaced Value" = Table.ReplaceValue(Source,null,"+=+",Replacer.ReplaceValue,Table.ColumnNames(Source)),
#"Added Index" = Table.AddIndexColumn(#"Replaced Value", "Index", 0, 1),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
// derive a table of replacements
#"Removed Duplicates" = Table.Distinct(#"Unpivoted Other Columns", {"Attribute", "Value"}),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"Attribute"}, {{"GRP", each Table.AddIndexColumn(_, "Index2", 1, 1), type table}}),
#"Expanded GRP" = Table.ExpandTableColumn(#"Grouped Rows", "GRP", {"Value", "Index2"}, {"Value", "Index2"}),
//replace originals
#"Merged Queries" = Table.NestedJoin(#"Unpivoted Other Columns",{"Attribute", "Value"},#"Expanded GRP",{"Attribute", "Value"},"EG",JoinKind.LeftOuter),
#"Expanded Table1" = Table.ExpandTableColumn(#"Merged Queries", "EG", {"Index2"}, {"Index2"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Table1",{"Value"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Index2", List.Sum),
#"Removed Columns1" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns1"

Related

Power BI: Subtract current row from previous row

Hi I have a below requirement.
I have a table named forcast table with forecast version and values.
I need to initially calculate the sum of values for each forecast. and subtract the forecast from the previous row(for 2022.7: I need to take 2022.8 and subtract 2022.7, for 2022.8: I need to take 2022.9 and subtract 2022.8 and so on)
Atlast, I have to build a graph with the differences.
Please find the clear requirement in below link.
https://github.com/samantha280/powerbi/blob/main/Book2.xlsx

powerquery / M on your forecast table
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Forecast", type number}, {"Vals", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Forecast"}, {{"Vals", each List.Sum([Vals]), type nullable number}}),
#"Added Index" = Table.AddIndexColumn(#"Grouped Rows", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try [Vals]-#"Added Index"{[Index]-1}[Vals] otherwise null),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] <> null)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Vals", "Index"})
in #"Removed Columns"

Need to split a column of values into multiple columns, based on the id they are referring to

Basically, I have this dummy table:
People
Group ID
Albert
1
Bernard
1
Charles
2
Daniel
2
Elizabeth
3
Francis
3
And what I would like to have is this:
People 1
People 2
Group ID
Albert
Bernard
1
Charles
Daniel
2
Elizabeth
Francis
3
I tried to pivot and unpivot here and there mindlessly to no avail, any ideas?

In powerquery,
Right-click the GroupID column and Group By...
Allow the default options and hit ok
Change the last part of the formula in the formula bar (or in home...advanced editor...) from
= Table.Group(Source, {"Group ID"}, {{"Count", each Table.RowCount(_), type number}})
to
= Table.Group(Source, {"Group ID"}, {{"Count", each Text.Combine(List.Transform([People], Text.From), ","), type text}})
that combines the People column into one column separated by commas
Then right click that column and split column by delimiter, for each occurrence of a comma
Full sample code:
let Source = Excel.CurrentWorkbook(){[Name="Table5"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Group ID"}, {{"Count", each Text.Combine(List.Transform([People], Text.From), ","), type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows", "Count", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Count.1", "Count.2"})
in #"Split Column by Delimiter"
//fancy version that includes Column titles and auto adjusts for dynamic number of columns
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"Group ID"}, {{"Count", each Text.Combine(List.Transform([People], Text.From), ","), type text}}),
DynamicColumnList = List.Transform({1..List.Max(Table.AddColumn(#"Grouped Rows", "Custom", each List.Count(Text.PositionOfAny([Count], {","}, Occurrence.All)))[Custom])+1}, each "Person." & Text.From(_)),
#"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows","Count",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),DynamicColumnList )
in #"Split Column by Delimiter"
Another way ....
Right-click the GroupID column and Group By...
Allow the default options and hit ok
Change the last part of the formula in the formula bar (or in home...advanced editor...) from
= Table.Group(Source, {"Group ID"}, {{"Count", each Table.RowCount(_), type number}})
to
= Table.Group(Source, {"Group ID"}, {{"count", each Table.AddIndexColumn(_, "Index", 1, 1), type table}})
Use arrows atop new column and expand [x] People and [x] Index
Click select index column, transform pivot, choose People as value, advanced options, don't aggregate
full sample code
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Group = Table.Group(Source, {"Group ID"}, {{"count", each Table.AddIndexColumn(_, "Index", 1, 1), type table}}),
#"Expanded count" = Table.ExpandTableColumn(Group, "count", {"People", "Index"}, {"People", "Index"}),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Expanded count", {{"Index", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Expanded count", {{"Index", type text}}, "en-US")[Index]), "Index", "People")
in #"Pivoted Column"

power bi how to count Safe / On Risk value for multi columns

how to count Safe / On Risk value for multi columns and make Pie chart for all columns
the target is to calculate how many (Safe) and how many (On Risk) in all columns
example

In Power Query Editor, create a new table using this below code-
Please replace your_table_name in this below code with your original Table name.
let
Source = your_table_name,
#"Removed Other Columns" = Table.SelectColumns(Source,{"Helmet", "Goggels", "Shoes", "Gloves"}),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Removed Other Columns", {}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Value"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
#"Grouped Rows"
**Keeping Location in the result set"-
let Source = your_table_name,
#"Removed Other Columns" = Table.SelectColumns(Source,{"Location", "Helmet", "Goggels", "Shoes", "Gloves"}),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Removed Other Columns", {"Location"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Columns",{"Attribute"}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Location", "Value"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
#"Grouped Rows"
Output will be as below-
Output with location-
Now you can create your Pie Chart using this new table and the final output will be as below-

Filter a column in Power Query so that it contains only the last date of each year

Can anyone please advise on how to filter this column in Power Query so that it contains only the last date of each year?
So, this should contain only 3 rows:
31/12/2019
31/12/2020
31/03/2021

Try this
Add custom column to pull out the year
= Date.Year([EndDate])
Add custom column to pull out the max date for each matching year
= (i)=>List.Max(Table.SelectRows(#"Added Custom" , each [Year]=i[Year]) [EndDate])
Add custom column to check the two dates against each other
= if [EndDate]=[MaxDate] then "keep" else "remove"
Filter on that column
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"EndDate", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Year", each Date.Year([EndDate])),
#"Added Custom2" = Table.AddColumn(#"Added Custom","MaxDate",(i)=>List.Max(Table.SelectRows(#"Added Custom" , each [Year]=i[Year]) [EndDate]), type date ),
#"Added Custom1" = Table.AddColumn(#"Added Custom2", "Custom", each if [EndDate]=[MaxDate] then "keep" else "remove"),
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom] = "keep"))
in #"Filtered Rows"
~ ~ ~
another way probably better for larger lists
Add custom column to pull out the year
= Date.Year([EndDate])
Group on year and take the Maximum of the EndDate Column
Merge that back to original data with left outer join and filter
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"EndDate", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Year", each Date.Year([EndDate])),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Year"}, {{"MaxDate", each List.Max([EndDate]), type date}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type",{"EndDate"}, #"Grouped Rows" ,{"MaxDate"},"Table2",JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(#"Merged Queries", "Table2", {"MaxDate"}, {"MaxDate"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded Table2", each ([MaxDate] <> null))
in #"Filtered Rows"

This is an older thread, but for those who are looking for the easiest solution, you can use :
= Table.SelectRows(#"Converted to Table", each [Date] = Date.EndOfYear( [Date] ) )
If you want to see the entire code in action, you can paste this in the advanced editor:
let
Source = List.Dates( #date( 2010, 1, 1), Duration.Days( #date( 2022, 12, 31) - #date( 2010, 1, 1 ) ) + 1 , #duration(1,0,0,0) ),
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), type table[ Date = Date.Type ] , null, ExtraValues.Error),
#"Filtered Rows" = Table.SelectRows(#"Converted to Table", each [Date] = Date.EndOfYear( [Date] ))
in
#"Filtered Rows"
Hope that helps!
Rick de Groot
https://gorilla.bi

Split a single column into multiple rows in PowerBI?

I'm trying to transform a single cell in a column into several rows. What technique or DAX should I use in PowerBI to achieve the result.
The table is details are given below,
+------------------------------------------------+----------------+
| Time | Status |
+------------------------------------------------+----------------+
| TimeStamp (2019-01-02, 2019-01-03, 2019-01-04) | (Yes, Yes, No) |
+------------------------------------------------+----------------+
I wanted a output like this,
+------------+----------+
| Time | Status |
+------------+----------+
| 2019-01-02 | Yes |
| 2019-01-03 | Yes |
| 2019-01-04 | No |
+------------+----------+
I have tried several solution and not able to attain a conclusion on PowerBI.

I have used both powerquery and DAX to solve this problem
First I have created book1 to get the status column and split column by (,) using row delimited option instead of column delimited and added index to it. Then book2 table created to get only the Time column and split column by (,) using same row delimited option and added index to it.
Book 1: Powerquery
let
Source = Csv.Document(File.Contents("C:\Users\PremChand\Desktop\stack\Book1.csv"),Delimiter=",", Columns=2, Encoding=65001, QuoteStyle=QuoteStyle.None]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Replaced Value" = Table.ReplaceValue(#"Promoted Headers","TimeStamp","",Replacer.ReplaceText,{"Time"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","(","",Replacer.ReplaceText,{"Time", "Status"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",")","",Replacer.ReplaceText,{"Time", "Status"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value2", {{"Status", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Status"),
#"Added Index" = Table.AddIndexColumn(#"Split Column by Delimiter", "Index", 0, 1),
#"Removed Columns" = Table.RemoveColumns(#"Added Index",{"Time"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Index", "Index1"}})
in
#"Renamed Columns"
Book 2: Powerquery
let
Source = Csv.Document(File.Contents("C:\Users\PremChand\Desktop\stack\Book1.csv"),[Delimiter=",", Columns=2, Encoding=65001, QuoteStyle=QuoteStyle.None]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
#"Replaced Value" = Table.ReplaceValue(#"Promoted Headers","TimeStamp","",Replacer.ReplaceText,{"Time"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","(","",Replacer.ReplaceText,{"Time", "Status"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",")","",Replacer.ReplaceText,{"Time", "Status"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value2", {{"Time", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Time"),
#"Added Index" = Table.AddIndexColumn(#"Split Column by Delimiter", "Index", 0, 1),
#"Removed Columns" = Table.RemoveColumns(#"Added Index",{"Status"})
in
#"Removed Columns"
Once book1 and book2 has created relationship has created for both Index column. Then the new output table has created using DAX to join book1 and book2.
Below DAX used to join two tables:
Output = NATURALLEFTOUTERJOIN(Book2,Book1)
The Output table consist of Time, Status and Index and Index1 column you can select only Time and status column to show in a table.

The easiest is to use Power Query. Open the advanced editor and past the code below. You need to edit the part where you get the data:
let
GetData = (test as record) => let
textTimeStamp = Text.BetweenDelimiters(test[TimeStamp], "(", ")"),
tableTimeStamp = Table.FromList(Function.Invoke(Splitter.SplitTextByDelimiter(","), {textTimeStamp}),null,{"TimeStamp"}),
tableTimeStampIndex = Table.AddIndexColumn(tableTimeStamp, "Index"),
textAnswer = Text.BetweenDelimiters(test[Answer], "(", ")"),
tableAnswer = Table.FromList( Function.Invoke(Splitter.SplitTextByDelimiter(","), {textAnswer}),null,{"Answer"}),
tableAnswerIndex = Table.AddIndexColumn(tableAnswer, "Index"),
joinedTable = Table.Join(tableTimeStampIndex, "Index", tableAnswerIndex, "Index"),
removeIndex = Table.RemoveColumns(joinedTable,{"Index"})
in
removeIndex,
TimeSplit = let
Source = Csv.Document(File.Contents("C:\Users\...\Documents\TimeSplit.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true])
in
#"Promoted Headers",
#"Invoked Custom Function" = Table.AddColumn(TimeSplit, "ToList", each GetData(_)),
#"Removed Columns" = Table.RemoveColumns(#"Invoked Custom Function",{"TimeStamp", "Answer"}),
#"Expanded ToList" = Table.ExpandTableColumn(#"Removed Columns", "ToList", {"TimeStamp", "Answer"}, {"TimeStamp", "Answer"})
in
#"Expanded ToList"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Convert categorical variables to numeric PowerQuery - powerbi

Related

Power BI: Subtract current row from previous row

Need to split a column of values into multiple columns, based on the id they are referring to

power bi how to count Safe / On Risk value for multi columns

Filter a column in Power Query so that it contains only the last date of each year

Split a single column into multiple rows in PowerBI?

Categories

Resources