How to Extract each number in a string (PowerBI)? - powerbi

I have a free form column that includes numbers and characters. My goal is to be able to extract each number into its own column. Calculated Columns or M code is fine. Here is an example:
Segment Notes
1 WO# 1234567 Call Tony # 623-623-6236 30 prior to arrival
2 Replaced 2 Hoses 7654321
3 Opened WO5674321 on 11/20/2019
Ultimately What I need is each number in each observation in its own column like this:
Segment Notes Num1 Num2 Num3
1 WO# 1234567 Call Tony # 623-623-6236 30 prior to arrival 1234567 623-692-9493 30
2 Replaced 2 Hoses 7654321 2 7654321
3 Opened WO5674321 on 11/20/2019 5674321 11/20/2019
If it is too difficult to extract dates and phone numbers in their entirety I can live with each element going into its own column. Thanks in advance.

There's an 'is value' function we can take advantage of.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("LY0xC8IwFIT/yvFcK/a9tClugku3gAgdQoagGQohCakI/fdGEe6m7+POWmLqaDEHsKhh1BOuPkbcc9pxgRZ1/FdD9Sh1zRWvDF/r+vaRXGdJ2sAtlOgf4QnBnLewYdLjoIR/gmqCKSE1vJh28QXICcwn6Vv4TM59AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Segment = _t, Notes = _t]),
#"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Notes]),each if Value.Is(Value.FromText(_), type number) or List.Contains({" ", "-", "/"}, _) then _ else null))))
in
#"Added Custom"

Related

How to consider specific list of dates for calculating standard deviation in power bi

I have a table which contains three columns. Column 1 - Employees (A & B), Column 2 - List of dates (different for each employees), Column 3 - Change in price for each list of dates.
I have to calculate the standard deviation for each employee based on the change in price for the two different list of dates.
when i'm using standard deviation formula in power query or in power bi. The standard deviation is getting results for all the dates and not for the specific list of dates. i.e., if the total dates is from 1st January to 31st January and for employee A, the list of dates is from 1st to 10th and for employee B the list of dates is from 20th to 31st. the formula calculates the standard deviation for 1st to 31st and not for the specific dates for each employee.
Is it possible for me to do this in power query? Any help would be appreciated.
PowerQuery is made for ETL, DAX is made for Analysis.
If you need a result column use:
Stdev of change = STDEV.S('Table'[Change in price])
If you want to have single results:
Stdev A =
CALCULATE(
STDEV.S('Table'[Change in price]),
'Table'[Employee] = "A"
)
Stdev B =
CALCULATE(
STDEV.S('Table'[Change in price]),
'Table'[Employee] = "B"
)
Sample data:
In PowerQuery use:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ddAxDoAgDIXhuzBrgi1FGPUahPtfwz4dBFsTl/cNpP6thSMsgSLxGjf97hH68nHSsTnOOtjxhCHWBe84nuHF+o5B1svPnVWHfV4v1zMfPwcnuETrQ4bJkaE4zm+eyZGhOo4M4tyDDMnx/c08OTIkx5EhW+efDoz/zer9Ag==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Employee = _t, Date = _t, #"Change in price" = _t]),
#"Changed Type" = Table.TransformColumnTypes(
Source,
{
{"Employee", type text},
{"Date", type date},
{"Change in price", Int64.Type}
}
),
#"Grouped Rows" = Table.Group(
#"Changed Type", {"Employee"},
{
{"Stdev", each List.StandardDeviation([Change in price]), type nullable number}
}
)
in
#"Grouped Rows"
Basically you start with the "Group By" GUI
and then go into the M-Code and replace List.Average with List.StandardDeviation, since this operation is not directly available in the GUI.

Add "missing" date rows for respective group and default value in another column in PowerBI?

I'm using PowerBI and looking to summarize (average) data over a period of time, however I realized that my source data doesn't reflect "empty" (zero totals) date values. This are valid and required for accurately aggregating totals over a period of time.
I've created a new date table using the following expression, to create all the dates within the preliminary tables range:
Date_Table = CALENDAR(MIN('SalesTable'[Daily Sales Date]),MAX('SalesTable'[Daily Sales Date]))
However, when trying to create a relationship with the created table and the original SalesTable to fill in the "missing dates" I haven't been successful. If anyone has encountered this a similar issue and has any advice or could point me towards resources to resolve this, I would be greatly appreciative.
I've included an example of my current and expected results below. Thanks!
current:
Item Group
Daily Sales Date
Total
Fruit
January 1
5
Vegetable
January 5
10
expected:
Item Group
Daily Sales Date
Total
Fruit
January 1
5
Fruit
January 2
0
Fruit
January 3
0
Fruit
January 4
0
Fruit
January 5
0
Vegetable
January 1
0
Vegetable
January 2
0
Vegetable
January 3
0
Vegetable
January 4
0
Vegetable
January 5
10
To do this in Power Query as you request, you can create the Date Table in Power Query, then Join it with each group in the Item Group column:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcisqzSxR0lHySswrTSyqVDAEsk2VYnWilcJS01NLEpNyUpFkTYFsQwOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Item Group" = _t, #"Daily Sales Date" = _t, Total = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Item Group", type text},
{"Daily Sales Date", type date},
{"Total", Int64.Type}}),
//create a table with a list of all dates for the date range in the table
allDates = Table.FromColumns({
List.Dates(List.Min(#"Changed Type"[Daily Sales Date]),
Duration.Days(List.Max(#"Changed Type"[Daily Sales Date]) - List.Min(#"Changed Type"[Daily Sales Date]))+1,
#duration(1,0,0,0))},type table[Dates=date]),
//group by the item group column
//Then join each subtable with the allDates table
group = Table.Group(#"Changed Type",{"Item Group"},{
{"Daily Sales Date", each Table.Join(_,"Daily Sales Date",allDates,"Dates",JoinKind.RightOuter)}
}),
//Expand the grouped table
#"Expanded Daily Sales Date" = Table.ExpandTableColumn(group, "Daily Sales Date", {"Total", "Dates"}, {"Total", "Dates"}),
//replace the nulls with zero's
#"Replaced Value" = Table.ReplaceValue(#"Expanded Daily Sales Date",null,0,Replacer.ReplaceValue,{"Total"}),
//set proper column order and types
#"Reordered Columns" = Table.ReorderColumns(#"Replaced Value",{"Item Group", "Dates", "Total"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"Dates", type date}, {"Total", Int64.Type}})
in
#"Changed Type1"
If you wanted to average over the existing date range, you can try this:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcisqzSxR0lHySswrTSyqVDAEsk2VYnWilcJS01NLEpNyUpFkTYFsQwOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Item Group" = _t, #"Daily Sales Date" = _t, Total = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Item Group", type text},
{"Daily Sales Date", type date},
{"Total", Int64.Type}}),
//count the number of dates
numDates = Duration.Days(List.Max(#"Changed Type"[Daily Sales Date]) - List.Min(#"Changed Type"[Daily Sales Date]))+1,
//group by Item Group, then average using Sum/Number of dates for each subgroup
#"Grouped Rows" = Table.Group(#"Changed Type", {"Item Group"}, {
{"Average", each List.Sum([Total])/numDates}})
in
#"Grouped Rows"
And there are numerous other ways of accomplishing what you probably require.

Concatenate comma delimited information from two columns into one

I am trying to concatenate information from two columns into one column in PowerBI. If all the cells were the same, this would be really straight forward. The issue is that I am working with address information that is extracted from a records database with an interesting set-up that I have no control over.
The PowerBI report I have built is used to compare the records database to an online spreadsheet that technicians are using to mark changes that need to be made to the records based on changes they make in a map database. The comparison is done in a PowerBI merged table between records database and the spreadsheet.
Most records are for only one address, while about 10% of the records have multiple address. Currently, the comparison report is telling us that the address do not match on these 10%, even if the addresses are a match.
Data Example of current result only using concatenate:
Row Number
Street Number
Street Name
Concatenated Address
1
234
Harvey St
234 Harvey St
2
246
Malone Ave
246 Malone Ave
3
872, 954
Bluebell Way, Main St
872, 954 Bluebell Way, Main St
4
376, 3457, 78
Harvey St, Bluebell Way, Malone Ave
376, 3457, 78 Harvey St, Bluebell Way, Malone Ave
This is what I am trying to achieve using Dax. So before someone say to split it in the Power Query and create more columns, I'd rather not since the number of address can vary, and I'm already at 46 columns including the ones below.
Data Example of the desired result:
Row Number
Street Number
Street Name
Concatenated Address
1
234
Harvey St
234 Harvey St
2
246
Malone Ave
246 Malone Ave
3
872, 954
Bluebell Way, Main St
872 Bluebell Way, 954 Main St
4
376, 3457, 78
Harvey St, Bluebell Way, Malone Ave
376 Harvey St, 3457 Bluebell Way, 78 Malone Ave
My thought is that maybe there is some way to use a delimiter with concatenating but I am not sure how.
Thank you in advance for anyone who can help me with solving this.
This worked for me in a similar need, although there may be simpler approaches I'm not aware of. Basic steps were to convert the columns to Lists, Zip the Lists, Expand the List, Extract Values, then Group and combine the results. Here's a bit more detail on each step I took:
Add Custom Column named [Custom] with this formula >
Text.Split([Street Number], ",")
Add another Custom Column named [Custom.1] with this formula >
Text.Split([Street Name], ",")
Add another Custom Column named [Custom.2] with this formula >
List.Zip({[Custom], [Custom.1]})
Expanded [Custom.2], formula bar shows this:
= Table.ExpandListColumn(#"Added Custom2", "Custom.2")
Extracted [Custom.2], formula bar shows this:
= Table.TransformColumns(#"Expanded Custom.2", {"Custom.2", each Text.Combine(List.Transform(_, Text.From), " "), type text})
Group on {"Row Number", "Street Number", "Street Name"}, but change the formula bar to use this function instead:
= Table.Group(#"Removed Other Columns", {"Row Number", "Street Number", "Street Name"}, {"Concatenated Address", each Text.Combine([Custom.2], ","), type text})
Here is the full Advanced Editor:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("XY6xCsMwDER/5fCspbYTu2M7denUIYPJ4IKGgkmhtIH8fSQMjZNFoNO706VkToaMdV7mLX9mXvD4VgXbPlIyVlXfy7zn8p4Yl5mrhEZQ0okcgyWcO429lh8/uRQMeSFhX1N9IQj2N+H/dw1Stws9wfkuEEJsSxKOwU0rcaEh1X/AQ9zVHlc=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Row Number" = _t, #"Street Number" = _t, #"Street Name" = _t, #"Concatenated Address" = _t]),
#"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Split([Street Number], ",")),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Text.Split([Street Name], ",")),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom.2", each List.Zip({[Custom], [Custom.1]})),
#"Expanded Custom.2" = Table.ExpandListColumn(#"Added Custom2", "Custom.2"),
#"Extracted Values" = Table.TransformColumns(#"Expanded Custom.2", {"Custom.2", each Text.Combine(List.Transform(_, Text.From), " "), type text}),
#"Grouped Rows" = Table.Group(#"Extracted Values", {"Row Number", "Street Number", "Street Name"}, {"Concatenated Address", each Text.Combine([Custom.2], ","), type text})
in
#"Grouped Rows"
Split Columns into Rows is your key... it is a long shoot to explain, therefore check the sample file and let me know if it works for you...
Sample File

Power BI - Stack chart sort by

I have the below chart
that is sorted by the count for each of the bins. I try to sort it by days open starting from 0 - 5 days, 5 - 10 days etc...
I have added another table that has IDs for each of the bins (0 - 5 days is 1, 5 - 10 days is 2) but I am unable to use it to sort it.
Any ideas?
I do it always by adding dimension table for sorting purpose. Dim table for bins would look like this:
Then go to Data pane and set it up as shown in the picture below.
Select Bin name column
Choose Modeling from menu
Sort by column and here choose column Bin order
Then connect the Dim table to fact table:
While making visual choose Bin name from Dim Table not Fact Table!
Then the final thing is set up sorting in visual:
Here you have Dim and Fact table to reproduce exercise.
Dim Table:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcsrMUzDQNVXSUTJUitWB8E11DQ2AAkZwAUMDXSOQiDFcxAioByRigtBkoGsIVmSK0GZkoA0UMFOKjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Bin name" = _t, #"Bin order" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Bin name", type text}, {"Bin order", Int64.Type}})
in
#"Changed Type"
Fact Table:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jdAxDoAgDAXQq5iu0qQtVmX1GoQDuHj/0SoJCWVh5KefR8kZrvtZCBUCMJRQz4pMFkgLmFC+ZGuJWKefUUL+h6zbekJrd3OV1EvHhBSnJHLU6ak0QaWRil5SBw2/pwPEMzvtHrLnlRc=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Bin name" = _t, Frequency = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Bin name", type text}, {"Frequency", Int64.Type}})
in
#"Changed Type"
You should be able to do a Sort by Column under the Modeling tab where you sort your bin name column by the ID value column.
You need to :
Connect the bins (0-5),(5-10) columns from the two tables in your relationships.
In your second table, add a column called order: 1,2,3 for the bins (0-5), 5-10 respectively and so on
This should work

Clean up table in Power BI

I am attempting to load several Excel files into Power BI. These files are pretty small (<= ~1k rows). One of these sources must be cleaned up. In particular, one of its columns has some bad data. The correct data is stored in another Excel file. For example:
table bad:
ID col1
1 0
2 0.5
3 2
4 -3
table correct:
ID colx
2 1
4 5
desired output:
ID col1
1 0
2 1
3 2
4 5
In SQL or other data visualization tools, I would left join the bad table to the clean table and then coalesce the bad values and correct values. I know that I have some options on how to implement this in Power BI. I think one option is to implement it in the query editor (i.e., M). I think another option is to implement it in the data model (i.e., DAX). Which option is best? And, what would the implementation look like (e.g., if M, then what does the query look like)?
While you can do this in DAX, I'd suggest doing it in the query editor. The steps would look roughly like this:
Merge the Correct table into the Bad table using a left outer join on the ID columns.
Expand out the Correct table to just get the Colx column.
Create a custom column to pick the values you want. (Add Column > Custom Column)
if [Colx] = null then [Col1] else [Colx]
You can remove the Col1 and Colx column if you want or just keep them. If you delete Col1, you can rename the Col2 column to be Col1.
If you don't want the source tables floating around, you can do all of the above in a single query similar to this:
let
BadSource = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTJQitWJVjICsfRMwWxjINsIzDIBsnSNlWJjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Col1 = _t]),
CorrectSource = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlLSUTJUitWJVjIBskyVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Colx = _t]),
Bad = Table.TransformColumnTypes(BadSource,{{"ID", Int64.Type}, {"Col1", type number}}),
Correct = Table.TransformColumnTypes(CorrectSource,{{"ID", Int64.Type}, {"Colx", type number}}),
#"Merged Queries" = Table.NestedJoin(Bad,{"ID"},Correct,{"ID"},"Correct",JoinKind.LeftOuter),
#"Expanded Correct" = Table.ExpandTableColumn(#"Merged Queries", "Correct", {"Colx"}, {"Colx"}),
#"Added Custom" = Table.AddColumn(#"Expanded Correct", "Col2", each if [Colx] = null then [Col1] else [Colx]),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Col1", "Colx"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Col2", "Col1"}})
in
#"Renamed Columns"