Power Query - Loop to extract text between delimiters - powerbi

I'm trying to extract text between delimiters for all available matches. The input column and the desired output are shown below:
Index
Country (input)
Country (desired output)
0
1, USA; 2, France; 3, Germany;
USA, France, Germany
1
4, Spain;
Spain
2
1, USA; 5, Italy;
USA, Italy
I tried to use the "Extract" and "Split columns" features by using ", " and ";" as delimiters but it didn't work as desired. I also tried to use Text.BetweenDelimiters and Splitter.SplitTextByEachDelimiter but I couldn't come up with a solution.
I wanted to write a loop in Power Query that can extract this data recursively, until all countries are extracted to a new column for each row.
Any ideas? Thanks in advance!

Seems like what you are doing is splitting on semicolon, then splitting on comma, then combining the results. So lets do that
Right click the column and split on semicolon, each occurrence of the delimiter, advanced option Rows
Right click the new column and split on comma, each occurrence of the delimiter, advanced option Columns
Right click the index and group
Edit the grouping formula in the formula bar or in home..advanced editor... to replace whatever it has as a default and instead end with this, which combines all the rows using a , delimiter
, each Text.Combine([ColumnNameGoesHere]," "), type text}})
Sample code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Country (input)", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv)}}), "Country (input)"),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Split Column by Delimiter", "Country (input)", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Country (input).1", "Country (input).2"}),
#"Grouped Rows" = Table.Group(#"Split Column by Delimiter1", {"Index"}, {{"Country (desired output)", each Text.Combine([#"Country (input).2"],", "), type text}})
in #"Grouped Rows"
~ ~ ~
I assume this is simplified data, otherwise it would simpler to just remove all numbers and semicolons in a single step
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Text" = Table.TransformColumns(Source,{{"Country (input)", each Text.RemoveRange(Text.Remove(_, {"1","2","3","4","5","6","7","8","9","0",";"}),0), type text}})
in #"Text"

Related

Power BI - How to display only the values which appear more than once in a column

I have a table with multiple columns. One of these is 'EAN'. In this column there are supposed to be unique values. Unfortunatly this is not the case. Now I want to find all the values that appear more then once.
I tried the FILTER, EARLIER, COUNTROWS. Nothing gives the output I'm looking for.
Example:
Art A - 111
Art B - 123
Art C - 222
Art D - 222
Art E - 456
What I expect as output is just a table, column or chart where '222' appears.
Create your visual using the EAN field.
Then create a measure with the formula:
= COUNTROWS('Table')
and drag this measure into the filters pane, setting the condition to 'greater than 1'.
Here's one way just using Power Query and M Code:
let
//read in and type the data
Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Art", type text}, {"Num", Int64.Type}}),
//selectd only those rows where there are at least two identical nums
filter = Table.SelectRows(#"Changed Type", each List.Count(List.Select(#"Changed Type"[Num], (li)=>li=[Num]))>1),
//sort the output to keep the duplicates together
#"Sorted Rows" = Table.Sort(filter,{{"Num", Order.Ascending}})
in
#"Sorted Rows"

How to erase rows with combined criteria?

How to erase rows of a table with two of criteria of two columns, combined? For example, I want to erase rows of products that come from France (country column) AND got the word ''wood'' in their name (name column).
Add column, custom column with formula
= if [Country]="France" and [Name]="Wood" then 1 else 0
Then use filter drop down atop that new column to uncheck the selection for [ ] 1 while leaving [x] 0
Potentially you meant that the Name field would contain the letters wood within a larger string of text (like Woodland), so you could use instead
= if [Country]="France" and Text.Contains(Text.Lower([Name]),"wood") then 1 else 0
You can use the Table.SelectRows method:
let
Source = Excel.CurrentWorkbook(){[Name="Table10"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,
{{"Country", type text}, {"Name", type text}, {"Value", type text}}),
filtered = Table.SelectRows(#"Changed Type",
each not ([Country]="France" and Text.Contains([Name],"Wood",Comparer.OrdinalIgnoreCase)))
in
filtered

How to remove duplicate values in one column and concatenate values from another column where removal happens?

The Data:
ACCOUNT DESC
Gallup 1
Gallup 2
Phoenix 2
Red Rock 1
Red Rock 2
Albuquerque 1
The desired output:
ACCOUNT DESC
Gallup 1,2
Phoenix 2
Red Rock 1,2
Albuquerque 1
but in a general scope as this is a small subset (Many other accounts, 100+)
Is there a way to remove duplicate values of "ACCOUNT" and to concatenate values from the removed duplicates where the "ACCOUNT" matched?? (this method is the preferred approach desired if possible)
Right click Account column, Group By
use New Column name : Data, Operation: All Rows,
Add Column, Custom Column, formula
=Table.Column([Data],"DESC")
Click on arrows at top of new column, extract value, use comma delimiter
Remove extra column
Full sample code if data was in Table1:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"ACCOUNT"}, {{"Data", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Column([Data],"DESC")),
#"Extracted Values" = Table.TransformColumns(#"Added Custom", {"Custom", each Text.Combine(List.Transform(_, Text.From), ","), type text}),
#"Removed Columns" = Table.RemoveColumns(#"Extracted Values",{"Data"})
in #"Removed Columns"

Power BI: Count multiple values in column

I'm working with the dataset where the gender of all participants is described in one single column. I'd like to create two new columns for both genders and fill it up with the number of males / females involved.
The syntax used in a column looks like this:
0::Male||1::Male||3::Male||4::Female
(so we have 4 participants, the value in col "Male" would be 3, in col "Female" 1)
Would you be so kind and help me to extract this information? ♥
I'm sorry to ask you as I know I'd be able to eventually find the solution by myself, but I'm really under pressure right now. :/
This is the screenshot of the column I wanna extract values from.
Thanks a lot to everyone who tries to help! :)
In powerquery, you'd need two splits (one on :: and one on ||, and then a pivot to get data into right format)
Select your data, including header column of participant_gender and load your table into powerquery with Data ... From Table/Range...[x] my data has headers. I assume below that this is the first table of your file, and is named Table1
Add Column..Index Column...
right click participant_gender column ... split column ... by delimiter ... custom ... || [x] each occurrence of the delimiter , Advanced options [x] rows
right click participant_gender column ... split column ... by delimiter ... custom ... :: [x] each occurrence of the delimiter (ignore advanced options)
Click to select both participant_gender.2 and Index columns .. right click group by ... group by (participant_gender.2) (index) new column name (count) operation (sum) column (participant_gender.1)
Click to select participant_Gender.2 column ... Transform..pivot column...Values column (Count)
File...Close and Load to ... table
Code produced, which you can paste into Home .. Advanced Editor... if you wish
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"participant_gender", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Added Index", {{"participant_gender", Splitter.SplitTextByDelimiter("||", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "participant_gender"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"participant_gender", type text}}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type1", "participant_gender", Splitter.SplitTextByDelimiter("::", QuoteStyle.Csv), {"participant_gender.1", "participant_gender.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"participant_gender.1", Int64.Type}, {"participant_gender.2", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type2", {"participant_gender.2", "Index"}, {{"Count", each List.Sum([participant_gender.1]), type number}}),
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[participant_gender.2]), "participant_gender.2", "Count", List.Sum)
in #"Pivoted Column"

Power Query - Data Transformation from a single column to a whole table

I have a requirement where I have a table like this -
Actual Table with 2 columns
Column1 Column2
ColAValue $$ ColBValue $$ New Row
ColCValue Above Row
ColCValue2 Above Row
$$ ColDValue Above Row
ColAValue $$ ColBValue $$ ColCValue $$ ColDValue New Row
ColAValue $$ ColBValue $$ ColCValue New Row
$$ ColDValue Above Row
I know by requirement, I would have 4 columns in my dataset leaving column 2.
I need my transformed table as a new table using query editor.
This is my expected output,
OutTable with 4 columns
Basically the column values are identified in order by delimiter $$ and if column2 says new row, then it is a new record else, it has to go and append on the current row as a new column value.
How can I transform my Input table to this output table in the query editor?
The final output Data type doesn't matter.
The initial step is to bring the row values from Above row into the
New row with a delimiter and have it as a single row.
The key here is to create a grouping column that assigns each row to its resulting output row number. You can do this by looking up the index of the last row with "New Row" in Column2.
First, create an index column (under the Add Column tab).
Now you can create your grouping custom column by taking the maximal index as described above. The formula might look something like this:
List.Max(
Table.SelectRows(#"Prev Step Name",
(here) => [Index] >= here[Index] and here[Column2] = "New Row"
)[Index]
)
Your table should look like this now:
Now we use Group By (under Home tab), grouping by the Group column and aggregating over Column1.
But we're going to change the aggregation from List.Max to Text.Combine so that the code for this step is
= Table.Group(#"Added Custom", {"Group"},
{{"Concat", each Text.Combine([Column1]," "), type text}})
Now the table should look like this:
From here, you can do Split Column By Delimiter (under Home tab) using " && " as your delimiter.
Change any column names as desired and delete the Group column if you no longer want it and the result should be your required output.
The M code for the whole query:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs7PcQxLzClNVVBRUQBynGAcJR0lv9RyhaD8cqVYHbA6Z7AUUNwxKb8sFVPGCEMKYqQLTn3YbVaAm6iAZgCagwhpR9OB2zWxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Column1", "Column2"}),
#"Added Custom" = Table.AddColumn(#"Reordered Columns", "Group", each List.Max(Table.SelectRows(#"Reordered Columns", (here) => [Index] >= here[Index] and here[Column2] = "New Row")[Index]), Int64.Type),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Group"}, {{"Concat", each Text.Combine([Column1]," "), type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Grouped Rows", "Concat", Splitter.SplitTextByDelimiter(" $$ ", QuoteStyle.Csv), {"COL1", "COL2", "COL3", "COL4"}),
#"Removed Columns" = Table.RemoveColumns(#"Split Column by Delimiter",{"Group"})
in
#"Removed Columns"