How to clean location data in power bi

How to clean location data in power bi - powerbi

I've currently got two tables. I have one table with a list of locations as such:
Zagreb (Croatia)
Seattle, WA, USA
New York City, NY
Kazakhstan, Almaty
I also have a master list of 200k cities that looks as such:
Zagreb | Croatia
Seattle | USA
New York City | USA
Almaty | Kazakhstan
The output I want is to add a new column to the first table as below:
Zagreb (Croatia) | Croatia
Seattle, WA, USA | USA
New York City, NY | USA
Kazakhstan, Almaty | Kazakhstan
This updated from a live source that I can't control the data quality from so any solution must be dynamic.
Any ideas appreciated!

One possible approach would be to add a custom column to the first table that searches the string for any cities that appear in the second table City column.
= Table.AddColumn(#"Changed Type", "City",
(L) => List.Select(Cities[City], each Text.Contains(L[Location], _)))
This gives a list of matching cities. Expand that list to get the following:
You can then merge with the Cities table (matching on the City columns from each table) to pull over the Country column.
Here's the full text of my query from the advanced editor:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WikpML0pNUtBwLspPLMlM1FSK1YlWCk5NLCnJSdVRCHfUUQgNdgQL+qWWK0TmF2UrOGeWVOoo+EWCRb0TqxKzM4pLEvN0FBxzchNLKpViYwE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Location = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Location", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "City", (L) => List.Select(Cities[City], each Text.Contains(L[Location], _))),
#"Expanded City" = Table.ExpandListColumn(#"Added Custom", "City"),
#"Merged Queries" = Table.NestedJoin(#"Expanded City",{"City"},Cities,{"City"},"Cities",JoinKind.LeftOuter),
#"Expanded Cities" = Table.ExpandTableColumn(#"Merged Queries", "Cities", {"Country"}, {"Country"})
in
#"Expanded Cities"

Name the 1st table as "location",including 1 column named "location".
Name the 2nd table as "city",including 2 columns named "city" and "country".
The code is:
let
location = Excel.CurrentWorkbook(){[Name="location"]}[Content],
city = Excel.CurrentWorkbook(){[Name="city"]}[Content],
result = Table.AddColumn(location,"city",each Table.SelectRows(city,(x)=>Text.Contains([location],x[city]))[country]{0})
in
result

Related

PowerBI: Counting days in a range on multiple line for staff absence purposes

I'm working on a staff absence dashboard.
I'd like to know how many staff absences we have on a given day, based on the data below. I would like to be able to create a table of each day, which counts the number of absences on that day. The absence needs to count on the date, which could fall on the start date, end date or in between those.
Full Name Start Date End Date
----------------------------------
Employee D 03/11/2022 05/11/2022
Employee E 03/11/2022 04/11/2022
Employee A 04/11/2022 04/11/2022
Employee B 04/11/2022 06/11/2022
Employee C 04/11/2022 04/11/2022
Employee B 05/11/2022 06/11/2022
Based on the above table, I would expect the following:
Date Count
----------------
03/11/2022 2
04/11/2022 5
05/11/2022 3
06/11/2022 2
I use this formula but the end result isn't counting properly. Could someone help me with the formula?
Count per day = COUNTROWS(FILTER('Staff absence', 'Staff absence'[Absence Start Date]= MIN('Attendance Dates'[Date]) && 'Staff absence'[Absence End Date] >= MAX('Attendance Dates'[Date])))

Best to fix this in your data model. In Power Query join the Absences to the Date table, to create a table with one row for each employee for each day they are absent. Then you can simply count the rows in this table for a given day.
Here's an example:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs0tyMmvTE1VcFHSUTIw1jc01DcyMDICcUzhnFgdJIWuGApNsCt0RJUDc4yMoApjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Full Name" = _t, #"Start Date" = _t, #"End Date" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Full Name", type text}, {"Start Date", type date}, {"End Date", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Days", each List.Dates([Start Date], 1+Number.From([End Date]-[Start Date]), #duration(1, 0, 0, 0))),
#"Expanded Days" = Table.ExpandListColumn(#"Added Custom", "Days"),
#"Renamed Columns" = Table.RenameColumns(#"Expanded Days",{{"Days", "Day Absent"}}),
#"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Start Date", "End Date"})
in
#"Removed Columns"

Convert a single row into multiple rows, depending on values in a specific column in Power Bi

I have rows of data which can have information in multiple columns that I need to extract and convert into an individual row for each.
E.g.
Original table
Headers are:
Product Code | Description | Location 1 | Location 2 | Location 3
and I need to convert it to:
Product Code | Description | Location
Some products will be available in multiple regions.
If a product is available in Germany and France, there may be an DE in the Location 1 column, and an FR in the Location 2 column, while the location 3 column will be blank.
I need to convert it so that there is a single location column with corresponding entries for each region that product had.
Desired output table
Is there a way to automate this in Power Bi?

Select the Code and description columns then
UnpivotOtherColumns
Remove the blank entries
Remove the Attribute column
Not sure how you want your results sorted, but you could easily add a sorting algorithm to below.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfJILAGSLq5AAoRidaKVjICM4OTEojQg7RYElwVJGQMZ7jn5ZanFQEaoN5KC2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Product Code" = _t, Description = _t, #"Location 1" = _t, #"Location 2" = _t, #"Location 3" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Product Code", Int64.Type}, {"Description", type text}, {"Location 1", type text}, {"Location 2", type text}, {"Location 3", type text}}),
//Select Product Code and Description Columns
// Then "Unpivot other Columns
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Changed Type",
{"Product Code", "Description"}, "Attribute", "Location"),
//Remove the blank locations and the "Attrubute" column
#"Filtered Rows" = Table.SelectRows(#"Unpivoted Other Columns", each ([Location] <> "")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Attribute"})
in
#"Removed Columns"

Add "missing" date rows for respective group and default value in another column in PowerBI?

I'm using PowerBI and looking to summarize (average) data over a period of time, however I realized that my source data doesn't reflect "empty" (zero totals) date values. This are valid and required for accurately aggregating totals over a period of time.
I've created a new date table using the following expression, to create all the dates within the preliminary tables range:
Date_Table = CALENDAR(MIN('SalesTable'[Daily Sales Date]),MAX('SalesTable'[Daily Sales Date]))
However, when trying to create a relationship with the created table and the original SalesTable to fill in the "missing dates" I haven't been successful. If anyone has encountered this a similar issue and has any advice or could point me towards resources to resolve this, I would be greatly appreciative.
I've included an example of my current and expected results below. Thanks!
current:
Item Group
Daily Sales Date
Total
Fruit
January 1
5
Vegetable
January 5
10
expected:
Item Group
Daily Sales Date
Total
Fruit
January 1
5
Fruit
January 2
0
Fruit
January 3
0
Fruit
January 4
0
Fruit
January 5
0
Vegetable
January 1
0
Vegetable
January 2
0
Vegetable
January 3
0
Vegetable
January 4
0
Vegetable
January 5
10

To do this in Power Query as you request, you can create the Date Table in Power Query, then Join it with each group in the Item Group column:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcisqzSxR0lHySswrTSyqVDAEsk2VYnWilcJS01NLEpNyUpFkTYFsQwOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Item Group" = _t, #"Daily Sales Date" = _t, Total = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Item Group", type text},
{"Daily Sales Date", type date},
{"Total", Int64.Type}}),
//create a table with a list of all dates for the date range in the table
allDates = Table.FromColumns({
List.Dates(List.Min(#"Changed Type"[Daily Sales Date]),
Duration.Days(List.Max(#"Changed Type"[Daily Sales Date]) - List.Min(#"Changed Type"[Daily Sales Date]))+1,
#duration(1,0,0,0))},type table[Dates=date]),
//group by the item group column
//Then join each subtable with the allDates table
group = Table.Group(#"Changed Type",{"Item Group"},{
{"Daily Sales Date", each Table.Join(_,"Daily Sales Date",allDates,"Dates",JoinKind.RightOuter)}
}),
//Expand the grouped table
#"Expanded Daily Sales Date" = Table.ExpandTableColumn(group, "Daily Sales Date", {"Total", "Dates"}, {"Total", "Dates"}),
//replace the nulls with zero's
#"Replaced Value" = Table.ReplaceValue(#"Expanded Daily Sales Date",null,0,Replacer.ReplaceValue,{"Total"}),
//set proper column order and types
#"Reordered Columns" = Table.ReorderColumns(#"Replaced Value",{"Item Group", "Dates", "Total"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"Dates", type date}, {"Total", Int64.Type}})
in
#"Changed Type1"
If you wanted to average over the existing date range, you can try this:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcisqzSxR0lHySswrTSyqVDAEsk2VYnWilcJS01NLEpNyUpFkTYFsQwOl2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Item Group" = _t, #"Daily Sales Date" = _t, Total = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"Item Group", type text},
{"Daily Sales Date", type date},
{"Total", Int64.Type}}),
//count the number of dates
numDates = Duration.Days(List.Max(#"Changed Type"[Daily Sales Date]) - List.Min(#"Changed Type"[Daily Sales Date]))+1,
//group by Item Group, then average using Sum/Number of dates for each subgroup
#"Grouped Rows" = Table.Group(#"Changed Type", {"Item Group"}, {
{"Average", each List.Sum([Total])/numDates}})
in
#"Grouped Rows"
And there are numerous other ways of accomplishing what you probably require.

Replace list in cell by values from other query

I have issue with defining smart data replacement in power query.
I am querying data from SharePoint, from multiple lists to create desired report.
If I need to replace values in column which is containing only 1 number, I am using merge queries function as "vlookup" replacement.
The issue starts when one column is containing multiple numbers, separated by semicolon.
Example
Source list:
| Unique ID | Name | Assignees_ID|
|-|-|-|
| Epic1 | Blabla1| 1 |
|Epic2 | Blabla2| 1;2;3|
"Vlookup_list" query:
|Assignees_ID|Assignees_Names|
|-|-|
|1|Mark|
|2|Irina|
|3|Bart|
Expected output:
| Unique ID | Name | Assignees_ID |Assignees_Names |
|-|-|-| - |
| Epic1 | Blabla1| 1 | Mark|
|Epic2 | Blabla2| 1;2;3| Mark; Irina; Bart|
So is there a smart way to perform such transition? I was trying multiple possibilities but my knowledge is too low to perform it.
Kind regards
Bartosz

In powerquery
Load the Vlookup_list into powerquery. Name the query VlookupNamesQuery File .. close and load to ... create connection only
Load the Example Source list into powerquery
Right click the Assignees_ID column and split by each semi-colon into rows
Merge in VlookupNamesQuery and match on ID using left outer join. Expand using arrows atop column to get Assignees_Names
Group on UniqueID and Name. Use home ... advanced editor ... to modify code to use Text.Combine to put together the ones that were split, as per below
let Source = Excel.CurrentWorkbook(){[Name="ExampleSourceListRange"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"UniqueID", type text}, {"Name", type text}, {"Assignees_ID", type text}}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Assignees_ID", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Assignees_ID"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Assignees_ID", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type1",{"Assignees_ID"},VlookupNamesQuery,{"Assignees_ID"},"Names",JoinKind.LeftOuter),
#"Expanded Names" = Table.ExpandTableColumn(#"Merged Queries", "Names", {"Assignees_Names"}, {"Assignees_Names"}),
#"Grouped Rows" = Table.Group(#"Expanded Names", {"UniqueID", "Name"}, {
{"Assignees_ID", each Text.Combine(List.Transform([Assignees_ID], Text.From), ";"), type text},
{"Assignees_Names", each Text.Combine(List.Transform([Assignees_Names], Text.From), ";"), type text}
})
in #"Grouped Rows"

M formula to add missing dates to table

Suppose I have a PowerBI date table, with some dates missing, similar to the following:
|---------------------|------------------|
| Date | quantity |
|---------------------|------------------|
| 1/1/2015 | 34 |
|---------------------|------------------|
| 1/4/2015 | 34 |
|---------------------|------------------|
Is there an M formula that would add the missing date rows (and just put in null for the second column), resulting in a table like below:
|---------------------|------------------|
| Date | quantity |
|---------------------|------------------|
| 1/1/2015 | 34 |
|---------------------|------------------|
| 1/2/2015 | null |
|---------------------|------------------|
| 1/3/2015 | null |
|---------------------|------------------|
| 1/4/2015 | 34 |
|---------------------|------------------|
I know this could be accomplished by merging a full [dates] table with my dataset, but that is not an option in my scenario. And I need to do this in M, during query manipulation, and not in DAX.
Appreciate the help!

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Base = Table.TransformColumnTypes(Source,{{"Date", type date}, {"quantity", Int64.Type}}),
// Generate list of dates between Max and Min dates of Table1
DateRange = Table.Group(Base, {}, {{"MinDate", each List.Min([Date]), type date}, {"MaxDate", each List.Max([Date]), type date}}),
StartDate = DateRange[MinDate]{0},
EndDate = DateRange[MaxDate]{0},
List ={Number.From(StartDate)..Number.From(EndDate)},
#"Converted to Table" = Table.FromList(List, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
FullList = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", type date}}),
//Right Anti Join to find dates not in original Table1
#"Merged Queries" = Table.NestedJoin(Base,{"Date"},FullList,{"Column1"},"Table2",JoinKind.RightAnti),
#"Removed Other Columns" = Table.SelectColumns(#"Merged Queries",{"Table2"}),
Extras = Table.ExpandTableColumn(#"Removed Other Columns", "Table2", {"Column1"}, {"Date"}),
Combined = Base & Extras
in Combined

Here's another way:
I start with a table named Table2 in an Excel worksheet and use it as my source. It looks like this:
Then, use PowerBI's Get Data, then select All > Excel and the Connect button, and navigate to the Excel file that has the table I'm going to use as my source and select it and click Open. Then I select Table2 (the name of the table I want to use) from the tables presented for selection, and I click the Edit button. This loads Table2 as my source.
The second and third lines in my M code below (Source and Table2_Table) are what is generated from the above steps and gets me to the table and loads it. These will be different for you, based on your source info. Your source path and file info and table names will be different.
let
Source = Excel.Workbook(File.Contents("mypath\myfile.xlsx"), null, true),
Table2_Table = Source{[Item="Table2",Kind="Table"]}[Data],
#"Generate Dates" = List.Generate(()=> Date.From(List.Min(Table2_Table[Date])), each _ <= Date.From(List.Max(Table2_Table[Date])), each Date.AddDays(DateTime.Date(_), 1)),
#"Converted to Table" = Table.FromList(#"Generate Dates", Splitter.SplitByNothing(), {"Date"}, null, ExtraValues.Error),
#"Merged Queries" = Table.NestedJoin(#"Converted to Table",{"Date"},Table2_Table,{"Date"},"Converted to Table",JoinKind.LeftOuter),
#"Expanded Converted to Table" = Table.ExpandTableColumn(#"Merged Queries", "Converted to Table", {"Quantity"}, {"Quantity"})
in
#"Expanded Converted to Table"
I get this table as output:
Which I can then use in PowerBI. For example, in a table like this:
P.S. I noticed that when using this in PowerQuery from within Excel only and not from within PowerBI, I need to explicitly change the type for the date fields or else the merge won't work right and the Quantity numbers won't appear. So if doing this only from within Excel and not within PowerBI, this code change seems to work:
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}}),
#"Generate Dates" = List.Generate(()=> Date.From(List.Min(#"Changed Type"[Date])), each _ <= Date.From(List.Max(#"Changed Type"[Date])), each Date.AddDays(DateTime.Date(_), 1)),
#"Converted to Table" = Table.FromList(#"Generate Dates", Splitter.SplitByNothing(), {"Date"}, null, ExtraValues.Error),
#"Changed Type1" = Table.TransformColumnTypes(#"Converted to Table",{{"Date", type date}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type1",{"Date"},#"Changed Type",{"Date"},"Converted to Table",JoinKind.LeftOuter),
#"Expanded Converted to Table" = Table.ExpandTableColumn(#"Merged Queries", "Converted to Table", {"Quantity"}, {"Quantity"})
in
#"Expanded Converted to Table"
Of course, it probably wouldn't hurt to explicitly assign the date types when working within PowerBI as well...just in case.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to clean location data in power bi - powerbi

Related

PowerBI: Counting days in a range on multiple line for staff absence purposes

Convert a single row into multiple rows, depending on values in a specific column in Power Bi

Add "missing" date rows for respective group and default value in another column in PowerBI?

Replace list in cell by values from other query

M formula to add missing dates to table

Categories

Resources