How to count row combinations based on common value - powerbi

I'm trying to find the frequency of combinations that occur per an ID value.
Example given here: https://i.stack.imgur.com/ZG9gJ.png
The problem is that the number of rows that could make up a combination is variable, meaning a combination could consist of just 1 value or 2, 3, 4, etc.
I'm currently trying to do this within Power BI, but perhaps another tool would be more appropriate.

You can do this with Power Query (from Power BI => Transform)
Basic algorithm
Group by ID
for each subGroup
Concatenate a sorted list of Cats
Count the number of Cats per ID for subsequent sorting
Then Group by COMBI
Aggregate with Count function
M Code
let
//change next line to however you are getting your table
Source = Excel.CurrentWorkbook(){[Name="idCat"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Cat1", type text}}),
//group by ID and create COMBI
// and also length of each cat string for subsequen intelligent sorting
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {
{"COMBI", each Text.Combine(List.Sort([Cat1])),type text},
{"lenCat", each Text.Length(Text.Combine(List.Sort([Cat1]))),Int64.Type}
}),
maxLen = List.Max(#"Grouped Rows"[lenCat]),
#"Delete length column" = Table.RemoveColumns(#"Grouped Rows","lenCat"),
//Group by Cats for counting
#"Grouped Cats" = Table.Group(#"Delete length column",{"COMBI"},{
{"COUNT", each List.Count([COMBI]), Int64.Type}
}),
#"Pad COMBI for Sorting" = Table.TransformColumns(#"Grouped Cats",{"COMBI", each Text.PadStart(_,maxLen), type text}),
#"Sorted Rows" = Table.Sort(#"Pad COMBI for Sorting",{{"COMBI", Order.Ascending}}),
#"Trim Leading Spaces" = Table.TransformColumns(#"Sorted Rows",{"COMBI", each Text.Trim(_), type text})
in
#"Trim Leading Spaces"

Related

Power Query: recursive function to append elements in a table

I am trying to use a recursive function to append values from a list to a table, however the code below only show me the first and second results:
let
Source = {"second", "third", "forth", "fith", "seventh", "eighth"},
Count = List.Count(Source),
Table = Table.FromRecords({[sequence = "first"]}, type table[sequence = text]),
appendTbl = (x as list, n as number, tbl as table) =>
let
appTable = Table.InsertRows(Table, n, {[sequence = Source{n}]}),
Check = if n = (Count-1) then #appendTbl(x, n+1, appTable) else appTable
in
Check,
Result = appendTbl(Source, 0, Table)
in
Result
Can anyone please give me a help? Thanks !
Its kind of hard to tell if you are using the number to designate the spot in the table you want to insert, or the number of times you want to duplicate the array before inserting it into the table
That said, you can combine tables with Table.Combine() after converting the list to a table with Table.FromList(). If you need to append it multiple times then just use List.Repeat on the list. If you need to use the Count variable in your function, you have to send it there appendTbl = (x as list, n as number, tbl as table, count as number) =>
some sample codes that probably don't do exactly what you want
let Source = {"second", "third", "forth", "fith", "seventh", "eighth"},
AppendCount=2, //# times to append the list onto the table
#"Converted to Table" = Table.FromList(List.Repeat(Source,AppendCount), Splitter.SplitByNothing(), null),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "sequence"}}),
Table = Table.FromRecords({[sequence = "first"]}, type table[sequence = text]),
combined= Table.Combine({Table, #"Renamed Columns"})
in combined
or
let Source = {"second", "third", "forth", "fith", "seventh", "eighth"},
Table = Table.FromRecords({[sequence = "first"]}, type table[sequence = text]),
appendTbl = (x as list, n as number, tbl as table) => // append list x to table tbl, n times on column sequence
let #"Converted to Table" = Table.FromList(List.Repeat(x,n), Splitter.SplitByNothing(), null),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "sequence"}}),
combined= Table.Combine({tbl, #"Renamed Columns"})
in combined,
Result = appendTbl(Source, 2, Table) // append Source to Table, 2 times
in Result

Power Query: Table.Group with a dynamic list of columns specifying column type

I have
ttOKLostTypes=Table.Group(#"Pivoted Column", {"Index"}, List.Transform(columnList2, each {_, (grp) => List.Max(Table.Column(grp, _)) })),
However this resets column types. How can I specify column types in the above transformation as here:
#"Grouped Rows" = Table.Group(#"Pivoted Column", {"Index"}, {{"InvoiceDate", each List.Max([InvoiceDate]),type nullable date},....
I know I can find out column types by using
schema=Table.Schema(#"Pivoted Column"),
but I cannot figure out how can I build a proper List with column types to be used in the Table.Group()
You can build a dynamic list of all the aggregations to include the data type, using List.Transform, by just adding the data type to your transformation.
Assuming the data types are all the same:
For example, if your grouping column is "Column1", then
maxCols = List.RemoveItems(Table.ColumnNames(#"Changed Type"),{"Column1"}),
colAggregations =
List.Transform(
maxCols,
(c)=> {c, each List.Max(Table.Column(_,c)),Int64.Type}
),
group = Table.Group(#"Changed Type","Column1", colAggregations)
EDIT
To include the types of the original columns, dynamically, is more difficult. Table.Schema will return the column types as text so they have to be transformed into a Type.
One way to do this is with a custom function.
Custom Function
name it: fnTextToType
I only included a few types. The Field name is a name returned by Table.Schema for a particular type, and the field value is the type. It is hopefully obvious how to extend this function to account for other types
(txt as text) =>
let
typeRecord =
Record.Field(
[Number.Type = Number.Type,
Int64.Type = Int64.Type,
DateTime.Type = DateTime.Type],
txt
)
in
typeRecord
Then you can use it in code like this:
#"Changed Type" = Table.TransformColumnTypes(rem,{{"Column1", Int64.Type}, {"Column2", type number}, {"Column3", Int64.Type}}),
//get list of column types in column order
//note these are returned as text strings and not as "types"
colTypes = Table.Schema(#"Changed Type")[TypeName],
//create list of columns upon which to execute the aggregation (List.Max in this case)
maxCols = List.RemoveItems(Table.ColumnNames(#"Changed Type"),{"Column1"}),
//create list of aggregations
colAggregations =
List.Transform(maxCols,(c)=> {c, each List.Max(Table.Column(_,c)),
fnTextToType(colTypes{List.PositionOf(Table.ColumnNames(#"Changed Type"),c)})}),
//now group them
group = Table.Group(#"Changed Type","Column1", colAggregations)
in
group
You can see how the types were maintained in the screenshots below.
Changed Type
group
Thanks #Ron Rosenfeld. Your answer which works suggested me to find another way using Expression.Evaluate. Evaluate without #shared does not work. See https://blog.crossjoin.co.uk/2015/02/06/expression-evaluate-in-power-querym/
columnList = Table.ColumnNames(#"Pivoted Column"),
columnList2 = List.RemoveItems(columnList,{"Index"}),
ColListWithTypes = List.Transform(columnList2,(colName)=> {colName,Table.SelectRows(schema,each [Name]=colName)[TypeName]{0}}),
ttTestWithTypes=Table.Group(#"Pivoted Column", {"Index"}, List.Transform(ColListWithTypes, each {_{0}, (grp) => List.Max(Table.Column(grp, _{0})),Expression.Evaluate(_{1},#shared)})),

Condition FOR/WHILE in M function - Paginate API in Power BI

I created this API paginated below. It roughly works, but in the OFFSET property, I need to stipulate instead of the number of the next sequence number of the record, for example, for the second page, the number 251, the next record of the second page, and so on.
My record limit per page is 250
The field totalItems returned the total of records, for example: 4500
I divide the total number of records by the total number of records per page, to get to know how many pages my API has: pageRange = {0..Number.RoundUp(totalItems / 250)}
When going to the second page, what happens in the API below, is that the records of the second page are coming repeated, because I should instead use the number 1 (referring to the second page), pass the number 251, and then, when doing the loop again, pass the number 501, until finishing the whole sequence (this parameter in the API is: offset=).
I need alter this line to include the FOR/WHILE for the item "ufnCallAPI(_)" of pages = List.Transform(pageRange, each ufnCallAPI(_)),
For example, the item above:
List.Transform(pageRange, each ufnCallAPI(_)),
List.Transform(pageRange, each ufnCallAPI(250)),
List.Transform(pageRange, each ufnCallAPI(500)),
List.Transform(pageRange, each ufnCallAPI(750)),
up to the total number totalItems
and include a FOR/WHILE to modified my API to not pass the number of the next page, but the number of the beginning of the list of the next item start (offset).
Thanks very much!
My code:
let
ufnCallAPI = (offSet) =>
let
query = Web.Contents("https://api.vhsys.com/v2/pedidos?offset=" & Number.ToText(offSet) & "&limit=250",
[Headers=[#"access-token"="OCKNYbAMaDgLBZBSQPCOGPWOXGSbdO", #"secret-access-token"="XXXXXXXXXXXXXX"]]),
result = Json.Document(query)
in
result,
tmpResult = ufnCallAPI(1),
auxTotal1 = Record.ToTable(tmpResult),
Value = auxTotal1{2}[Value],
auxTotal2 = Value[total],
totalItems = auxTotal2 -1,
pageRange = {0..Number.RoundUp(totalItems / 250)},
pages =List.Transform(pageRange, each ufnCallAPI(_)),
pages2 = Table.FromList(pages, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
pages3 = Table.ExpandRecordColumn(pages2, "Column1", {"code", "status", "paging", "data"}, {"Column1.code", "Column1.status", "Column1.paging", "Column1.data"}),
pages4 = Table.ExpandListColumn(pages3, "Column1.data"),
pages5 = Table.RemoveColumns(pages4,{"Column1.code", "Column1.status", "Column1.paging"}),
data = Table.ExpandRecordColumn(pages5, "Column1.data", {"id_ped", "id_pedido", "id_cliente", "nome_cliente", "id_local_retirada", "id_local_cobranca", "vendedor_pedido", "vendedor_pedido_id", "listapreco_produtos", "valor_total_produtos", "desconto_pedido", "desconto_pedido_porc", "peso_total_nota", "peso_total_nota_liq", "frete_pedido", "valor_total_nota", "valor_baseICMS", "valor_ICMS", "valor_baseST", "valor_ST", "valor_IPI", "condicao_pagamento_id", "condicao_pagamento", "frete_por_pedido", "transportadora_pedido", "id_transportadora", "data_pedido", "prazo_entrega", "referencia_pedido", "obs_pedido", "obs_interno_pedido", "status_pedido", "contas_pedido", "comissao_pedido", "estoque_pedido", "ordemc_emitido", "data_cad_pedido", "data_mod_pedido", "id_aplicativo", "id_pedido_aplicativo", "lixeira"}, {"id_ped", "id_pedido", "id_cliente", "nome_cliente", "id_local_retirada", "id_local_cobranca", "vendedor_pedido", "vendedor_pedido_id", "listapreco_produtos", "valor_total_produtos", "desconto_pedido", "desconto_pedido_porc", "peso_total_nota", "peso_total_nota_liq", "frete_pedido", "valor_total_nota", "valor_baseICMS", "valor_ICMS", "valor_baseST", "valor_ST", "valor_IPI", "condicao_pagamento_id", "condicao_pagamento", "frete_por_pedido", "transportadora_pedido", "id_transportadora", "data_pedido", "prazo_entrega", "referencia_pedido", "obs_pedido", "obs_interno_pedido", "status_pedido", "contas_pedido", "comissao_pedido", "estoque_pedido", "ordemc_emitido", "data_cad_pedido", "data_mod_pedido", "id_aplicativo", "id_pedido_aplicativo", "lixeira"}),
#"Tipo Alterado" = Table.TransformColumnTypes(data,{{"id_ped", type text}, {"id_pedido", Int64.Type}, {"nome_cliente", type text}, {"valor_total_produtos", type text}}),
#"Valor Substituído" = Table.ReplaceValue(#"Tipo Alterado",".",",",Replacer.ReplaceText,{"valor_total_produtos"}),
#"Tipo Alterado1" = Table.TransformColumnTypes(#"Valor Substituído",{{"valor_total_produtos", Currency.Type}}),
#"Valor Substituído1" = Table.ReplaceValue(#"Tipo Alterado1",".",",",Replacer.ReplaceValue,{"desconto_pedido", "desconto_pedido_porc", "peso_total_nota", "peso_total_nota_liq", "frete_pedido", "valor_total_nota", "valor_baseICMS", "valor_ICMS", "valor_baseST", "valor_ST", "valor_IPI"}),
#"Tipo Alterado2" = Table.TransformColumnTypes(#"Valor Substituído1",{{"desconto_pedido", Currency.Type}, {"desconto_pedido_porc", Currency.Type}, {"peso_total_nota", Currency.Type}, {"peso_total_nota_liq", Currency.Type}, {"frete_pedido", Currency.Type}, {"valor_total_nota", type text}, {"valor_baseICMS", Currency.Type}, {"valor_ICMS", Currency.Type}, {"valor_baseST", Currency.Type}, {"valor_ST", Currency.Type}, {"valor_IPI", Currency.Type}, {"prazo_entrega", type text}, {"data_pedido", type date}}),
#"Colunas Removidas" = Table.RemoveColumns(#"Tipo Alterado2",{"id_aplicativo", "id_pedido_aplicativo", "lixeira"}),
#"Tipo Alterado3" = Table.TransformColumnTypes(#"Colunas Removidas",{{"valor_total_nota", type text}}),
#"Valor Substituído2" = Table.ReplaceValue(#"Tipo Alterado3",".",",",Replacer.ReplaceText,{"valor_total_nota"}),
#"Tipo Alterado4" = Table.TransformColumnTypes(#"Valor Substituído2",{{"valor_total_nota", Currency.Type}})
in
#"Tipo Alterado4"
If I understood your problem correctly of web scraping then I would suggest you don't need to apply for/while loop.
Please, review the below link which might help your issue.
https://www.youtube.com/watch?v=n0bSddoCmss
https://www.myonlinetraininghub.com/scrape-data-multiple-web-pages-power-query

Multiple row into one row

I'm new in powerbi and i'm looking some help with a transformation.
What i'm trying to do with powerquery :
first i want to group the following columns : call_key ivr_agent cli dnis lang_id
and after i need to copy the other infos into one row only : all other info need to go on one row.
the second row with same call_key (and others) need to go on a new column.
In few words:
I need that all rows with same call_key are on one row only
File excel test : https://1drv.ms/x/s!AqE6W5akVSvUh59KfGmUiCSnZH6OVg
Thank you so much for your help,
Phil
I couldn't understand exactly if you needed the rows in new columns or just merge them in a single one.
For merging in a single one, try this query:
let
Origen = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
RemoveBlanks = Table.SelectRows(Origen, each [call_key] <> null and [call_key] <> ""),
CombineCols = Table.AddColumn(RemoveBlanks, "MergedCol", each Text.Combine({Text.From([action_time], "es-CO"), [ivr_module], [action_location], [action_type], [action], [action_data1_desc], Text.From([action_data1_value], "es-CO"), [action_data2_desc], [action_data2_value], [action_data3_desc], Text.From([action_data3_value], "es-CO")}, "|"), type text),
RemoveCols = Table.SelectColumns(CombineCols,{"call_key", "ivr_agent", "cli", "dnis", "lang_id", "MergedCol"}),
GroupAndMerge = Table.Group(RemoveCols, {"call_key", "ivr_agent", "cli", "dnis", "lang_id"}, {{"New", each Text.Combine([MergedCol], "#(lf)"), type text}})
in
GroupAndMerge
EDIT: You may split it again, like this:
let
Origen = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
RemoveBlanks = Table.SelectRows(Origen, each [call_key] <> null and [call_key] <> ""),
CombineCols = Table.AddColumn(RemoveBlanks, "MergedCol", each Text.Combine({Text.From([action_time], "es-CO"), [ivr_module], [action_location], [action_type], [action], [action_data1_desc], Text.From([action_data1_value], "es-CO"), [action_data2_desc], [action_data2_value], [action_data3_desc], Text.From([action_data3_value], "es-CO")}, "|"), type text),
RemoveCols = Table.SelectColumns(CombineCols,{"call_key", "ivr_agent", "cli", "dnis", "lang_id", "MergedCol"}),
GroupAndMerge = Table.Group(RemoveCols, {"call_key", "ivr_agent", "cli", "dnis", "lang_id"}, {{"New", each Text.Combine([MergedCol], "#(lf)"), type text}}),
SplitColumn = Table.SplitColumn(GroupAndMerge, "New", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"New.1", "New.2", "New.3", "New.4", "New.5", "New.6", "New.7", "New.8", "New.9", "New.10", "New.11", "New.12", "New.13", "New.14", "New.15", "New.16", "New.17", "New.18", "New.19", "New.20", "New.21", "New.22", "New.23", "New.24", "New.25", "New.26", "New.27", "New.28", "New.29", "New.30", "New.31", "New.32", "New.33", "New.34", "New.35", "New.36", "New.37", "New.38", "New.39", "New.40", "New.41", "New.42", "New.43", "New.44", "New.45", "New.46", "New.47", "New.48", "New.49", "New.50", "New.51", "New.52", "New.53", "New.54", "New.55", "New.56", "New.57", "New.58", "New.59"})
in
SplitColumn

Converting non structured key value pairs data to a table

I have data in the following format, sample shown below:
ValA=101
ValB=2938
ValA=998
ValB=387
ValA=876
ValB=9832
I know that each set of ValA & ValB are a set of values that belong together, so output will be:
ValA ValB
101 2938
998 387
.......
.......
I need to get this into a tabular format so each valA ValB pair is one row.
Ive tried doing this in powerquery by splitting on the = sign and then pivoting on the Val name, but it doesnt work.
any idea on how this might be easily achieved in powerquery?
Thanks!
I ended up doing the exact same as Lukasz, here's the full code:
let
Source = "ValA=101
ValB=2938
ValA=998
ValB=387
ValA=876
ValB=9832",
Custom1 = Lines.FromText(Source),
#"Converted to Table" = Table.FromList(Custom1, Splitter.SplitTextByDelimiter("="), null, null, ExtraValues.Error),
ChangedType = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", type text}, {"Column2", Int64.Type}}),
CustomA = Table.AddColumn(ChangedType, "ValA", each if [Column1] = "ValA" then [Column2] else null),
CustomB = Table.AddColumn(CustomA, "ValB", each if [Column1] = "ValB" then [Column2] else null),
FilledDown = Table.FillDown(CustomB,{"ValA"}),
FilteredRows = Table.SelectRows(FilledDown, each [ValB] <> null)
in
FilteredRows
Lukasz's second idea using pivot columns looks like this:
let
Source = "ValA=101
ValB=2938
ValA=998
ValB=387
ValA=876
ValB=9832",
Custom1 = Lines.FromText(Source),
#"Converted to Table" = Table.FromList(Custom1, Splitter.SplitTextByDelimiter("="), null, null, ExtraValues.Error),
ChangedType = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", type text}, {"Column2", Int64.Type}}),
AddedIndex = Table.AddIndexColumn(ChangedType, "Index", 0, 1),
IntegerDividedColumn = Table.TransformColumns(AddedIndex, {{"Index", each Number.IntegerDivide(_, 2), Int64.Type}}),
PivotedColumn = Table.Pivot(IntegerDividedColumn, List.Distinct(IntegerDividedColumn[Column1]), "Column1", "Column2")
in
PivotedColumn
The trick I found was to add divided-by-two index column (that goes 0, 0, 1, 1, 2, 2...) so the pivot knows the first two rows should be related, and the next two, etc.
You can do the following:
1) create two new calculated columns with logic like if column1 contains ValA then Column1 else null. same logic for ValB in second column.
2) use the fill down feature on the left most column. This will produce rows with values for both ValA and ValB in distinct columns
3) use the filter feature to filter out rows that have nulls in your two new columns
That should give you what you want.
Edit: thinking about this more you might also try: split column1 on the equal sign. Then pivot the new column and it should produce two columns with the discrete values. HTH.