Related
I have a table of people records with various demographic information (Race, Ethnicity, Gender etc.).
For null values in [Ethnicity] (i.e., Hispanic Y/N), I want to search the corresponding [Race] value for a string "non-Hispanic", since for some records these have been stored as a combined value under [Race] (e.g., "White (non-Hispanic)"). and I'd like to clean/normalize both fields ([Race] is cleaned in a separate downstream step).
However, I'm unsure why my code is not successfully identifying matches to the first two conditions, since I know there are many instances of "White (non-Hispanic)" at the very least:
cleanData =
Table.ReplaceValue(rawData, each [Ethnicity], each
if [Ethnicity] = null and (
Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or
Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase))
then "Non-hispanic" else
if [Ethnicity] = null and
Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase)
then "Hispanic" else
[Ethnicity], Replacer.ReplaceText, {"Ethnicity"}
),
Both fields are type Text, and I'm not hitting an error - just a lack of expected behavior. The null values in [Ethnicity] are unchanged.
Sample input:
Race
Ethnicity
White
Yes
Asian
No
White (non-Hispanic)
Decline to respond
White (non-Hispanic)
null
White (Hispanic)
null
Asian
null
Sample output:
Race
Ethnicity
White
Yes
Asian
No
White (non-Hispanic)
Decline to Respond
White (non-Hispanic)
No
White (Hispanic)
Yes
Asian
null
Its the Replacer.ReplaceText which should be Replacer.ReplaceValue
cleanData = Table.ReplaceValue(rawData, each [Ethnicity], each
if [Ethnicity] = null then
if (Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase))
then "Non-hispanic" else
if Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase) then "Hispanic" else [Ethnicity]
else [Ethnicity]
,Replacer.ReplaceValue,{"Ethnicity"}),
or
cleanData = Table.ReplaceValue(rawData, each [Ethnicity], each
if [Ethnicity] = null and
(Text.Contains([Race],"non-Hispanic", Comparer.OrdinalIgnoreCase) or Text.Contains([Race],"not Hispanic", Comparer.OrdinalIgnoreCase) )
then "Non-hispanic" else
if [Ethnicity] = null and Text.Contains([Race], "hispanic", Comparer.OrdinalIgnoreCase) then "Hispanic" else [Ethnicity]
,Replacer.ReplaceValue,{"Ethnicity"}),
#horseyride you can abstract it to use a list of any size
Text.ContainsAny = (source as text, searchStrings as list) as logical =>
let
matches = List.Transform( searchStrings, (string) =>
Text.Contains(source, string, Comparer.OrdinalIgnoreCase) )
in
List.AnyTrue(matches),
then you can write
if [Ethnicity] = null and Text.ContainsAny( [Race], {"non-hispanic", "not-hispanic"} ) then "Non-hispanic"
else ...
I here's another way using Table.AddColumn, then rename. That might sound worse than using replacing a column, in-place. After testing, I was getting the same final folded SQL query -- using either method.
let
Source = Table.FromRows(
Json.Document( Binary.Decompress( Binary.FromText( "i45WCs/ILElVUNJRikwtVorViVZyLM5MzAMJ+OWD+RAFGnn5eboemcUFiXmZyZogaZfU5JzMvFSFknyFotTigvy8FLzKkSUxJeCWoppRgm5GLAA=", BinaryEncoding.Base64 ), Compression.Deflate ) ),
let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Race = _t, Ethnicity = _t]
),
// eliminate whitespace, or control chars
#"Changed Type" = Table.TransformColumnTypes( Source, {{"Race", type text}, {"Ethnicity", type nullable text}} ),
#"Cleaned Text" = Table.TransformColumns( Source, {{"Ethnicity", Text.Clean, type nullable text}}),
#"Replaced Value" = Table.ReplaceValue( #"Cleaned Text", "", null, Replacer.ReplaceValue, {"Ethnicity"} ),
merge_columns = Table.AddColumn(
#"Changed Type",
"Ethnicity2",
(row) =>
let
isBlank = row[Ethnicity] = null or row[Ethnicity] = "",
race = row[Race],
replacement =
if (
Text.Contains(race, "non-Hispanic", Comparer.OrdinalIgnoreCase)
or Text.Contains(race, "not-Hispanic", Comparer.OrdinalIgnoreCase)
) then
"Non-Hispanic"
else if Text.Contains(race, "Hispanic", Comparer.OrdinalIgnoreCase) then
"Hispanic"
else
race
in
if isBlank then
replacement
else
race,
type text
),
#"ReplaceColumns" = Table.RenameColumns(
Table.RemoveColumns(merge_columns, {"Ethnicity"}), {{"Ethnicity2", "Ethnicity"}}
)
in
#"ReplaceColumns"
I'm trying to find the frequency of combinations that occur per an ID value.
Example given here: https://i.stack.imgur.com/ZG9gJ.png
The problem is that the number of rows that could make up a combination is variable, meaning a combination could consist of just 1 value or 2, 3, 4, etc.
I'm currently trying to do this within Power BI, but perhaps another tool would be more appropriate.
You can do this with Power Query (from Power BI => Transform)
Basic algorithm
Group by ID
for each subGroup
Concatenate a sorted list of Cats
Count the number of Cats per ID for subsequent sorting
Then Group by COMBI
Aggregate with Count function
M Code
let
//change next line to however you are getting your table
Source = Excel.CurrentWorkbook(){[Name="idCat"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Cat1", type text}}),
//group by ID and create COMBI
// and also length of each cat string for subsequen intelligent sorting
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {
{"COMBI", each Text.Combine(List.Sort([Cat1])),type text},
{"lenCat", each Text.Length(Text.Combine(List.Sort([Cat1]))),Int64.Type}
}),
maxLen = List.Max(#"Grouped Rows"[lenCat]),
#"Delete length column" = Table.RemoveColumns(#"Grouped Rows","lenCat"),
//Group by Cats for counting
#"Grouped Cats" = Table.Group(#"Delete length column",{"COMBI"},{
{"COUNT", each List.Count([COMBI]), Int64.Type}
}),
#"Pad COMBI for Sorting" = Table.TransformColumns(#"Grouped Cats",{"COMBI", each Text.PadStart(_,maxLen), type text}),
#"Sorted Rows" = Table.Sort(#"Pad COMBI for Sorting",{{"COMBI", Order.Ascending}}),
#"Trim Leading Spaces" = Table.TransformColumns(#"Sorted Rows",{"COMBI", each Text.Trim(_), type text})
in
#"Trim Leading Spaces"
I have
ttOKLostTypes=Table.Group(#"Pivoted Column", {"Index"}, List.Transform(columnList2, each {_, (grp) => List.Max(Table.Column(grp, _)) })),
However this resets column types. How can I specify column types in the above transformation as here:
#"Grouped Rows" = Table.Group(#"Pivoted Column", {"Index"}, {{"InvoiceDate", each List.Max([InvoiceDate]),type nullable date},....
I know I can find out column types by using
schema=Table.Schema(#"Pivoted Column"),
but I cannot figure out how can I build a proper List with column types to be used in the Table.Group()
You can build a dynamic list of all the aggregations to include the data type, using List.Transform, by just adding the data type to your transformation.
Assuming the data types are all the same:
For example, if your grouping column is "Column1", then
maxCols = List.RemoveItems(Table.ColumnNames(#"Changed Type"),{"Column1"}),
colAggregations =
List.Transform(
maxCols,
(c)=> {c, each List.Max(Table.Column(_,c)),Int64.Type}
),
group = Table.Group(#"Changed Type","Column1", colAggregations)
EDIT
To include the types of the original columns, dynamically, is more difficult. Table.Schema will return the column types as text so they have to be transformed into a Type.
One way to do this is with a custom function.
Custom Function
name it: fnTextToType
I only included a few types. The Field name is a name returned by Table.Schema for a particular type, and the field value is the type. It is hopefully obvious how to extend this function to account for other types
(txt as text) =>
let
typeRecord =
Record.Field(
[Number.Type = Number.Type,
Int64.Type = Int64.Type,
DateTime.Type = DateTime.Type],
txt
)
in
typeRecord
Then you can use it in code like this:
#"Changed Type" = Table.TransformColumnTypes(rem,{{"Column1", Int64.Type}, {"Column2", type number}, {"Column3", Int64.Type}}),
//get list of column types in column order
//note these are returned as text strings and not as "types"
colTypes = Table.Schema(#"Changed Type")[TypeName],
//create list of columns upon which to execute the aggregation (List.Max in this case)
maxCols = List.RemoveItems(Table.ColumnNames(#"Changed Type"),{"Column1"}),
//create list of aggregations
colAggregations =
List.Transform(maxCols,(c)=> {c, each List.Max(Table.Column(_,c)),
fnTextToType(colTypes{List.PositionOf(Table.ColumnNames(#"Changed Type"),c)})}),
//now group them
group = Table.Group(#"Changed Type","Column1", colAggregations)
in
group
You can see how the types were maintained in the screenshots below.
Changed Type
group
Thanks #Ron Rosenfeld. Your answer which works suggested me to find another way using Expression.Evaluate. Evaluate without #shared does not work. See https://blog.crossjoin.co.uk/2015/02/06/expression-evaluate-in-power-querym/
columnList = Table.ColumnNames(#"Pivoted Column"),
columnList2 = List.RemoveItems(columnList,{"Index"}),
ColListWithTypes = List.Transform(columnList2,(colName)=> {colName,Table.SelectRows(schema,each [Name]=colName)[TypeName]{0}}),
ttTestWithTypes=Table.Group(#"Pivoted Column", {"Index"}, List.Transform(ColListWithTypes, each {_{0}, (grp) => List.Max(Table.Column(grp, _{0})),Expression.Evaluate(_{1},#shared)})),
I'm new in powerbi and i'm looking some help with a transformation.
What i'm trying to do with powerquery :
first i want to group the following columns : call_key ivr_agent cli dnis lang_id
and after i need to copy the other infos into one row only : all other info need to go on one row.
the second row with same call_key (and others) need to go on a new column.
In few words:
I need that all rows with same call_key are on one row only
File excel test : https://1drv.ms/x/s!AqE6W5akVSvUh59KfGmUiCSnZH6OVg
Thank you so much for your help,
Phil
I couldn't understand exactly if you needed the rows in new columns or just merge them in a single one.
For merging in a single one, try this query:
let
Origen = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
RemoveBlanks = Table.SelectRows(Origen, each [call_key] <> null and [call_key] <> ""),
CombineCols = Table.AddColumn(RemoveBlanks, "MergedCol", each Text.Combine({Text.From([action_time], "es-CO"), [ivr_module], [action_location], [action_type], [action], [action_data1_desc], Text.From([action_data1_value], "es-CO"), [action_data2_desc], [action_data2_value], [action_data3_desc], Text.From([action_data3_value], "es-CO")}, "|"), type text),
RemoveCols = Table.SelectColumns(CombineCols,{"call_key", "ivr_agent", "cli", "dnis", "lang_id", "MergedCol"}),
GroupAndMerge = Table.Group(RemoveCols, {"call_key", "ivr_agent", "cli", "dnis", "lang_id"}, {{"New", each Text.Combine([MergedCol], "#(lf)"), type text}})
in
GroupAndMerge
EDIT: You may split it again, like this:
let
Origen = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
RemoveBlanks = Table.SelectRows(Origen, each [call_key] <> null and [call_key] <> ""),
CombineCols = Table.AddColumn(RemoveBlanks, "MergedCol", each Text.Combine({Text.From([action_time], "es-CO"), [ivr_module], [action_location], [action_type], [action], [action_data1_desc], Text.From([action_data1_value], "es-CO"), [action_data2_desc], [action_data2_value], [action_data3_desc], Text.From([action_data3_value], "es-CO")}, "|"), type text),
RemoveCols = Table.SelectColumns(CombineCols,{"call_key", "ivr_agent", "cli", "dnis", "lang_id", "MergedCol"}),
GroupAndMerge = Table.Group(RemoveCols, {"call_key", "ivr_agent", "cli", "dnis", "lang_id"}, {{"New", each Text.Combine([MergedCol], "#(lf)"), type text}}),
SplitColumn = Table.SplitColumn(GroupAndMerge, "New", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"New.1", "New.2", "New.3", "New.4", "New.5", "New.6", "New.7", "New.8", "New.9", "New.10", "New.11", "New.12", "New.13", "New.14", "New.15", "New.16", "New.17", "New.18", "New.19", "New.20", "New.21", "New.22", "New.23", "New.24", "New.25", "New.26", "New.27", "New.28", "New.29", "New.30", "New.31", "New.32", "New.33", "New.34", "New.35", "New.36", "New.37", "New.38", "New.39", "New.40", "New.41", "New.42", "New.43", "New.44", "New.45", "New.46", "New.47", "New.48", "New.49", "New.50", "New.51", "New.52", "New.53", "New.54", "New.55", "New.56", "New.57", "New.58", "New.59"})
in
SplitColumn
I have data in the following format, sample shown below:
ValA=101
ValB=2938
ValA=998
ValB=387
ValA=876
ValB=9832
I know that each set of ValA & ValB are a set of values that belong together, so output will be:
ValA ValB
101 2938
998 387
.......
.......
I need to get this into a tabular format so each valA ValB pair is one row.
Ive tried doing this in powerquery by splitting on the = sign and then pivoting on the Val name, but it doesnt work.
any idea on how this might be easily achieved in powerquery?
Thanks!
I ended up doing the exact same as Lukasz, here's the full code:
let
Source = "ValA=101
ValB=2938
ValA=998
ValB=387
ValA=876
ValB=9832",
Custom1 = Lines.FromText(Source),
#"Converted to Table" = Table.FromList(Custom1, Splitter.SplitTextByDelimiter("="), null, null, ExtraValues.Error),
ChangedType = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", type text}, {"Column2", Int64.Type}}),
CustomA = Table.AddColumn(ChangedType, "ValA", each if [Column1] = "ValA" then [Column2] else null),
CustomB = Table.AddColumn(CustomA, "ValB", each if [Column1] = "ValB" then [Column2] else null),
FilledDown = Table.FillDown(CustomB,{"ValA"}),
FilteredRows = Table.SelectRows(FilledDown, each [ValB] <> null)
in
FilteredRows
Lukasz's second idea using pivot columns looks like this:
let
Source = "ValA=101
ValB=2938
ValA=998
ValB=387
ValA=876
ValB=9832",
Custom1 = Lines.FromText(Source),
#"Converted to Table" = Table.FromList(Custom1, Splitter.SplitTextByDelimiter("="), null, null, ExtraValues.Error),
ChangedType = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", type text}, {"Column2", Int64.Type}}),
AddedIndex = Table.AddIndexColumn(ChangedType, "Index", 0, 1),
IntegerDividedColumn = Table.TransformColumns(AddedIndex, {{"Index", each Number.IntegerDivide(_, 2), Int64.Type}}),
PivotedColumn = Table.Pivot(IntegerDividedColumn, List.Distinct(IntegerDividedColumn[Column1]), "Column1", "Column2")
in
PivotedColumn
The trick I found was to add divided-by-two index column (that goes 0, 0, 1, 1, 2, 2...) so the pivot knows the first two rows should be related, and the next two, etc.
You can do the following:
1) create two new calculated columns with logic like if column1 contains ValA then Column1 else null. same logic for ValB in second column.
2) use the fill down feature on the left most column. This will produce rows with values for both ValA and ValB in distinct columns
3) use the filter feature to filter out rows that have nulls in your two new columns
That should give you what you want.
Edit: thinking about this more you might also try: split column1 on the equal sign. Then pivot the new column and it should produce two columns with the discrete values. HTH.