Concatenate comma delimited information from two columns into one - powerbi

I am trying to concatenate information from two columns into one column in PowerBI. If all the cells were the same, this would be really straight forward. The issue is that I am working with address information that is extracted from a records database with an interesting set-up that I have no control over.
The PowerBI report I have built is used to compare the records database to an online spreadsheet that technicians are using to mark changes that need to be made to the records based on changes they make in a map database. The comparison is done in a PowerBI merged table between records database and the spreadsheet.
Most records are for only one address, while about 10% of the records have multiple address. Currently, the comparison report is telling us that the address do not match on these 10%, even if the addresses are a match.
Data Example of current result only using concatenate:
Row Number
Street Number
Street Name
Concatenated Address
1
234
Harvey St
234 Harvey St
2
246
Malone Ave
246 Malone Ave
3
872, 954
Bluebell Way, Main St
872, 954 Bluebell Way, Main St
4
376, 3457, 78
Harvey St, Bluebell Way, Malone Ave
376, 3457, 78 Harvey St, Bluebell Way, Malone Ave
This is what I am trying to achieve using Dax. So before someone say to split it in the Power Query and create more columns, I'd rather not since the number of address can vary, and I'm already at 46 columns including the ones below.
Data Example of the desired result:
Row Number
Street Number
Street Name
Concatenated Address
1
234
Harvey St
234 Harvey St
2
246
Malone Ave
246 Malone Ave
3
872, 954
Bluebell Way, Main St
872 Bluebell Way, 954 Main St
4
376, 3457, 78
Harvey St, Bluebell Way, Malone Ave
376 Harvey St, 3457 Bluebell Way, 78 Malone Ave
My thought is that maybe there is some way to use a delimiter with concatenating but I am not sure how.
Thank you in advance for anyone who can help me with solving this.

This worked for me in a similar need, although there may be simpler approaches I'm not aware of. Basic steps were to convert the columns to Lists, Zip the Lists, Expand the List, Extract Values, then Group and combine the results. Here's a bit more detail on each step I took:
Add Custom Column named [Custom] with this formula >
Text.Split([Street Number], ",")
Add another Custom Column named [Custom.1] with this formula >
Text.Split([Street Name], ",")
Add another Custom Column named [Custom.2] with this formula >
List.Zip({[Custom], [Custom.1]})
Expanded [Custom.2], formula bar shows this:
= Table.ExpandListColumn(#"Added Custom2", "Custom.2")
Extracted [Custom.2], formula bar shows this:
= Table.TransformColumns(#"Expanded Custom.2", {"Custom.2", each Text.Combine(List.Transform(_, Text.From), " "), type text})
Group on {"Row Number", "Street Number", "Street Name"}, but change the formula bar to use this function instead:
= Table.Group(#"Removed Other Columns", {"Row Number", "Street Number", "Street Name"}, {"Concatenated Address", each Text.Combine([Custom.2], ","), type text})
Here is the full Advanced Editor:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("XY6xCsMwDER/5fCspbYTu2M7denUIYPJ4IKGgkmhtIH8fSQMjZNFoNO706VkToaMdV7mLX9mXvD4VgXbPlIyVlXfy7zn8p4Yl5mrhEZQ0okcgyWcO429lh8/uRQMeSFhX1N9IQj2N+H/dw1Stws9wfkuEEJsSxKOwU0rcaEh1X/AQ9zVHlc=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Row Number" = _t, #"Street Number" = _t, #"Street Name" = _t, #"Concatenated Address" = _t]),
#"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Split([Street Number], ",")),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Text.Split([Street Name], ",")),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom.2", each List.Zip({[Custom], [Custom.1]})),
#"Expanded Custom.2" = Table.ExpandListColumn(#"Added Custom2", "Custom.2"),
#"Extracted Values" = Table.TransformColumns(#"Expanded Custom.2", {"Custom.2", each Text.Combine(List.Transform(_, Text.From), " "), type text}),
#"Grouped Rows" = Table.Group(#"Extracted Values", {"Row Number", "Street Number", "Street Name"}, {"Concatenated Address", each Text.Combine([Custom.2], ","), type text})
in
#"Grouped Rows"

Split Columns into Rows is your key... it is a long shoot to explain, therefore check the sample file and let me know if it works for you...
Sample File

Related

Power BI - How to display only the values which appear more than once in a column

I have a table with multiple columns. One of these is 'EAN'. In this column there are supposed to be unique values. Unfortunatly this is not the case. Now I want to find all the values that appear more then once.
I tried the FILTER, EARLIER, COUNTROWS. Nothing gives the output I'm looking for.
Example:
Art A - 111
Art B - 123
Art C - 222
Art D - 222
Art E - 456
What I expect as output is just a table, column or chart where '222' appears.
Create your visual using the EAN field.
Then create a measure with the formula:
= COUNTROWS('Table')
and drag this measure into the filters pane, setting the condition to 'greater than 1'.
Here's one way just using Power Query and M Code:
let
//read in and type the data
Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Art", type text}, {"Num", Int64.Type}}),
//selectd only those rows where there are at least two identical nums
filter = Table.SelectRows(#"Changed Type", each List.Count(List.Select(#"Changed Type"[Num], (li)=>li=[Num]))>1),
//sort the output to keep the duplicates together
#"Sorted Rows" = Table.Sort(filter,{{"Num", Order.Ascending}})
in
#"Sorted Rows"

Power Query - Loop to extract text between delimiters

I'm trying to extract text between delimiters for all available matches. The input column and the desired output are shown below:
Index
Country (input)
Country (desired output)
0
1, USA; 2, France; 3, Germany;
USA, France, Germany
1
4, Spain;
Spain
2
1, USA; 5, Italy;
USA, Italy
I tried to use the "Extract" and "Split columns" features by using ", " and ";" as delimiters but it didn't work as desired. I also tried to use Text.BetweenDelimiters and Splitter.SplitTextByEachDelimiter but I couldn't come up with a solution.
I wanted to write a loop in Power Query that can extract this data recursively, until all countries are extracted to a new column for each row.
Any ideas? Thanks in advance!
Seems like what you are doing is splitting on semicolon, then splitting on comma, then combining the results. So lets do that
Right click the column and split on semicolon, each occurrence of the delimiter, advanced option Rows
Right click the new column and split on comma, each occurrence of the delimiter, advanced option Columns
Right click the index and group
Edit the grouping formula in the formula bar or in home..advanced editor... to replace whatever it has as a default and instead end with this, which combines all the rows using a , delimiter
, each Text.Combine([ColumnNameGoesHere]," "), type text}})
Sample code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Country (input)", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv)}}), "Country (input)"),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Split Column by Delimiter", "Country (input)", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Country (input).1", "Country (input).2"}),
#"Grouped Rows" = Table.Group(#"Split Column by Delimiter1", {"Index"}, {{"Country (desired output)", each Text.Combine([#"Country (input).2"],", "), type text}})
in #"Grouped Rows"
~ ~ ~
I assume this is simplified data, otherwise it would simpler to just remove all numbers and semicolons in a single step
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Text" = Table.TransformColumns(Source,{{"Country (input)", each Text.RemoveRange(Text.Remove(_, {"1","2","3","4","5","6","7","8","9","0",";"}),0), type text}})
in #"Text"

How to remove duplicate values in one column and concatenate values from another column where removal happens?

The Data:
ACCOUNT DESC
Gallup 1
Gallup 2
Phoenix 2
Red Rock 1
Red Rock 2
Albuquerque 1
The desired output:
ACCOUNT DESC
Gallup 1,2
Phoenix 2
Red Rock 1,2
Albuquerque 1
but in a general scope as this is a small subset (Many other accounts, 100+)
Is there a way to remove duplicate values of "ACCOUNT" and to concatenate values from the removed duplicates where the "ACCOUNT" matched?? (this method is the preferred approach desired if possible)
Right click Account column, Group By
use New Column name : Data, Operation: All Rows,
Add Column, Custom Column, formula
=Table.Column([Data],"DESC")
Click on arrows at top of new column, extract value, use comma delimiter
Remove extra column
Full sample code if data was in Table1:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"ACCOUNT"}, {{"Data", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Column([Data],"DESC")),
#"Extracted Values" = Table.TransformColumns(#"Added Custom", {"Custom", each Text.Combine(List.Transform(_, Text.From), ","), type text}),
#"Removed Columns" = Table.RemoveColumns(#"Extracted Values",{"Data"})
in #"Removed Columns"

How to Extract each number in a string (PowerBI)?

I have a free form column that includes numbers and characters. My goal is to be able to extract each number into its own column. Calculated Columns or M code is fine. Here is an example:
Segment Notes
1 WO# 1234567 Call Tony # 623-623-6236 30 prior to arrival
2 Replaced 2 Hoses 7654321
3 Opened WO5674321 on 11/20/2019
Ultimately What I need is each number in each observation in its own column like this:
Segment Notes Num1 Num2 Num3
1 WO# 1234567 Call Tony # 623-623-6236 30 prior to arrival 1234567 623-692-9493 30
2 Replaced 2 Hoses 7654321 2 7654321
3 Opened WO5674321 on 11/20/2019 5674321 11/20/2019
If it is too difficult to extract dates and phone numbers in their entirety I can live with each element going into its own column. Thanks in advance.
There's an 'is value' function we can take advantage of.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("LY0xC8IwFIT/yvFcK/a9tClugku3gAgdQoagGQohCakI/fdGEe6m7+POWmLqaDEHsKhh1BOuPkbcc9pxgRZ1/FdD9Sh1zRWvDF/r+vaRXGdJ2sAtlOgf4QnBnLewYdLjoIR/gmqCKSE1vJh28QXICcwn6Vv4TM59AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Segment = _t, Notes = _t]),
#"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Notes]),each if Value.Is(Value.FromText(_), type number) or List.Contains({" ", "-", "/"}, _) then _ else null))))
in
#"Added Custom"

How to clean location data in power bi

I've currently got two tables. I have one table with a list of locations as such:
Zagreb (Croatia)
Seattle, WA, USA
New York City, NY
Kazakhstan, Almaty
I also have a master list of 200k cities that looks as such:
Zagreb | Croatia
Seattle | USA
New York City | USA
Almaty | Kazakhstan
The output I want is to add a new column to the first table as below:
Zagreb (Croatia) | Croatia
Seattle, WA, USA | USA
New York City, NY | USA
Kazakhstan, Almaty | Kazakhstan
This updated from a live source that I can't control the data quality from so any solution must be dynamic.
Any ideas appreciated!
One possible approach would be to add a custom column to the first table that searches the string for any cities that appear in the second table City column.
= Table.AddColumn(#"Changed Type", "City",
(L) => List.Select(Cities[City], each Text.Contains(L[Location], _)))
This gives a list of matching cities. Expand that list to get the following:
You can then merge with the Cities table (matching on the City columns from each table) to pull over the Country column.
Here's the full text of my query from the advanced editor:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WikpML0pNUtBwLspPLMlM1FSK1YlWCk5NLCnJSdVRCHfUUQgNdgQL+qWWK0TmF2UrOGeWVOoo+EWCRb0TqxKzM4pLEvN0FBxzchNLKpViYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Location = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "City", (L) => List.Select(Cities[City], each Text.Contains(L[Location], _))),
#"Expanded City" = Table.ExpandListColumn(#"Added Custom", "City"),
#"Merged Queries" = Table.NestedJoin(#"Expanded City",{"City"},Cities,{"City"},"Cities",JoinKind.LeftOuter),
#"Expanded Cities" = Table.ExpandTableColumn(#"Merged Queries", "Cities", {"Country"}, {"Country"})
in
#"Expanded Cities"
Name the 1st table as "location",including 1 column named "location".
Name the 2nd table as "city",including 2 columns named "city" and "country".
The code is:
let
location = Excel.CurrentWorkbook(){[Name="location"]}[Content],
city = Excel.CurrentWorkbook(){[Name="city"]}[Content],
result = Table.AddColumn(location,"city",each Table.SelectRows(city,(x)=>Text.Contains([location],x[city]))[country]{0})
in
result