Concatenate ESRI JSON geometry rings using Powerquery in Power Bi

Concatenate ESRI JSON geometry rings using Powerquery in Power Bi - powerbi

I've been attempting to extract the geometry from an ESRI Rest endpoint. I return JSON and can drill down to the point where each row is a list of lists. I would like to concatenate all the points into a new row that Power Bi can display which I believe is in format
POLYGON((lon, lat lon2, lat2 lon3, lat3 lon4, lat4))
There is not a set number of points per polygon. In the example below there are 4 points that make up the polygon. If you check the endpoint url there are many polygons.
"geometry": {
"rings": [
[
[
-91.477749413304764,
31.470175721032774
],
[
-91.477709210911314,
31.470214015064812
],
[
-91.477676009740037,
31.470105771763997
],
[
-91.477749413304764,
31.470175721032774
]
]
]
}
Here is my current code
let
Source = Json.Document(Web.Contents("https://gispublic.ducks.org/arcgis/rest/services/WMU/PastUnits/MapServer/0/query?where=1%3D1&text=&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&distance=&units=esriSRUnit_Foot&relationParam=&outFields=*&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=&havingClause=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&historicMoment=&returnDistinctValues=false&resultOffset=&resultRecordCount=&returnExtentOnly=false&datumTransformation=&parameterValues=&rangeValues=&quantizationParameters=&featureEncoding=esriDefault&f=pjson"), 65001),
features = Source[features],
#"convertedtoTable" = Table.FromList(features, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"expandedColumn" = Table.ExpandRecordColumn(#"convertedtoTable", "Column1", {"geometry"}, {"geometry"}),
geo = Table.ExpandRecordColumn(expandedColumn, "geometry", {"rings"}, {"geometry.rings"}),
#"geometry rings" = geo[geometry.rings],
#"Converted to Table" = Table.FromList(#"geometry rings", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
Column1 = #"Converted to Table"[Column1],
#"Converted to Table1" = Table.FromList(Column1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table1", "Column1")
in
#"Expanded Column1"

You can
"drill down" a little further so that each list just contains a single polygon
Extract the list into a delimited array
split the array into columns where each row represents a polygon
With regard to the "splitting", if you will be doing this once, you can just accept the code generated by the UI.
If you will be doing this multiple times, and there might be different numbers of polygons in each run, it would be more efficient to calculate the number of columns needed.
But here is specimen code, replacing all after your #"geometry rings", but shortened as there are over 4000 columns generated.
#"geometry rings" = geo[geometry.rings],
//drill down to combine each ring into a single list
comb1 = List.Transform(#"geometry rings", each List.Combine(_)),
comb2 = List.Transform(comb1, each List.Combine(_)),
//convert to table
rings = Table.FromList(comb2,Splitter.SplitByNothing(),{"Rings"}),
#"Extracted Values" = Table.TransformColumns(rings, {"Rings", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "Rings", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), {"Rings.1", "Rings.2", ...,
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Rings.1", type number}, {"Rings.2", type number}, ...
in
#"Changed Type
Edit: shortened code for dynamic numbers of columns
#"geometry rings" = geo[geometry.rings],
//drill down to combine each ring into a single list
comb1 = List.Transform(#"geometry rings", each List.Combine(_)),
comb2 = List.Transform(comb1, each List.Combine(_)),
numCols = List.Accumulate(comb2,
0,
(state,current)=> if state > List.Count(current) then state else List.Count(current)),
//convert to table
rings = Table.FromList(comb2,Splitter.SplitByNothing(),{"Rings"}),
#"Extracted Values" = Table.TransformColumns(rings, {"Rings", each Text.Combine(List.Transform(_, Text.From), ";"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "Rings",
Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), numCols),
colTypes = List.Transform(Table.ColumnNames(#"Split Column by Delimiter"), each {_, Number.Type}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",colTypes)
in
#"Changed Type"

Related

Power Query data load: large dataset causes 'expression.error column not found'

When I try to load the query below with the entire data set (ie, around 100,000 rows in total) it errors with:
"Expression.Error column "Column1" not found" (Column1 refers to a column which I expanded in a function and then call in the query)
When I limit the dataset to call only ~900 rows, I do NOT get the error and the dataset loads perfectly.
My conclusion is that there is NOT ACTUALLY an error in my query, but somehow the size of the dataset is causing this.
QUESTION: How can I prevent this error and get my whole dataset to
load?
*********QUERY THAT DOES NOT LOAD:
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Invoked Custom Function" = Table.AddColumn(Source, "FnGet", each FnGet([companyID])),
#"Expanded FnGet" = Table.ExpandTableColumn(#"Invoked Custom Function", "FnGet", {"date", "articleUrl", "title", "summary", "reads"}, {"date", "articleUrl", "title", "summary", "reads"}) in
#"Expanded FnGet"
***QUERY THAT HAS NO ERROR ON LOAD:
NOTE STEP 1 "Kept First Rows" which reduces the dataset to approx 1000 rows
let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
#"Kept First Rows" = Table.FirstN(Source,10),
#"Invoked Custom Function" = Table.AddColumn(#"Kept First Rows", "FnGet", each FnGet([companyID])),
#"Expanded FnGet" = Table.ExpandTableColumn(#"Invoked Custom Function", "FnGet", {"date", "articleUrl", "title", "summary", "reads"}, {"date", "articleUrl", "title", "summary", "reads"}) in
#"Expanded FnGet"
*************** ADDITIONAL INFORMATION ******************
The query references a function "FnGet" which references Column1 in step 3:
let
Source = (companyID as any) => let
Source = Json.Document(Web.Contents("https://www.lexology.com/api/v1/track/clients/articles?companyId=" & Number.ToText(companyID)&"&limit=100" , [Headers=[ApiKey="ABCD12345"]])),
#"Converted to Table" = Record.ToTable(Source),
Value = #"Converted to Table"{7}[Value],
#"Converted to Table1" = Table.FromList(Value, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table1", "Column1", {"date", "articleUrl", "title", "summary", "reads"}, {"date", "articleUrl", "title", "summary", "reads"})
in
#"Expanded Column1"
in
Source

Grouped running total with Power Query M

This is how my table looks like (1.7 million rows):
I'm trying to build a running total per customer ID and date.
This is easy to express using DAX, but unfortunately I don't have enough memory on my machine (16GB RAM).
So, I'm trying to find an alternative with Power Query M using buffered tables, etc. but that is too complicated for me.
Can anyone help? Thank you so much in advance!
EDIT: After sorting by Date and CustomerID, added index and added a custom column with:
= Table.AddColumn(#"Added Index", "Personalizado", each (i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]))
I get the following:
EDIT2:
The whole code:
let
Origem = dataset,
#"Linhas Agrupadas" = Table.Group(Origem, {"Date", "CustomerID"}, {{"Sales", each List.Sum([Sales]), type nullable number}}),
#"Linhas Ordenadas" = Table.Sort(#"Linhas Agrupadas",{{"Date", Order.Ascending}, {"CustomerID", Order.Ascending}}),
#"Linhas Filtradas" = Table.SelectRows(#"Linhas Ordenadas", each [Sales] <> 0),
#"Added Index" = Table.AddIndexColumn(#"Linhas Filtradas", "Index", 0, 1, Int64.Type),
#"Personalizado Adicionado" = Table.AddColumn(#"Added Index","CumSum",(i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]), type number )
in
#"Personalizado Adicionado"

Method1
Sort your data to start with, perhaps on the date column and CustomerID column. However it appears on screen is the row order it is going to accumulate the total
Add column .. index column...
Add column .. custom column with formula
= (i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales])
Right click index column and remove it
Likely adding a Table.Buffer() around the index step will help speed things up
Sample full code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Sorted Rows" = Table.Sort(Source,{{"CustomerID", Order.Ascending}, {"Date", Order.Ascending}}),
#"Added Index" = Table.Buffer(Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1)),
#"Added Custom" = Table.AddColumn(#"Added Index","CumSum",(i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]), type number ),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"})
in #"Removed Columns"
Method 2:
Create function fn_cum_total
(Input) =>
let withindex = Table.AddIndexColumn(Input, "Index", 1, 1),
cum = Table.AddColumn(withindex, "Total",each List.Sum(List.Range(withindex[Sales],0,[Index])))[Total]
in cum
Create query that uses that function to add cumulative totals to Sales column after grouping on CustomerID
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Sorted Rows" = Table.Buffer(Table.Sort(Source,{{"CustomerID", Order.Ascending}, {"Date", Order.Ascending}})),
Running_Total = Table.Group(#"Sorted Rows",{"CustomerID"},{{"Data",
(Input as table) as table => let zz = fn_cum_total(Input),
result = Table.FromColumns(Table.ToColumns(Input)&{zz}, Value.Type(Table.AddColumn(Input, "total", each null, type number))) in result, type table}} ),
#"Expanded Data" = Table.ExpandTableColumn(Running_Total, "Data", {"Date", "Sales", "total"}, {"Date", "Sales", "total"})
in #"Expanded Data"
I cannot take credit for method 2, borrowed long ago, but do not recall source

Is there a way to extract the "title" attribute content from a span using Power Query?

I'm trying to scrape some data from a website, but I need the date/time included on the span tag, as showed below:
<span class="hourAgo ng-binding" title="07/07/2020 às 09:43:33">Há 3 horas</span>
The PowerQuery looks like that:
Source = Web.BrowserContents("https://www.reclameaqui.com.br/empresa/nestle/lista-reclamacoes/"),
#"Extracted Table From Html" =
Html.Table(Source, {{
"Column1", ".text-title"
}, {
"Column2", ".text-description"
}, {
"Column3", ".status-text"
}, {
"Column4", ".hourAgo" <<<<<<< Here's the class selector I got, but I need the title content
}, {
"Column5", ".mdi-map-marker + *"
}},
[RowSelector=".complain-list:nth-child(1) LI"]),
#"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{
"Column1", type text
}, {
"Column2", type text
}, {
"Column3", type text
}, {
"Column4", type text
}, {
"Column5", type text
}})
in
#"Changed Type"
All other columns are fine. That code returns me the "Há 3 horas" span content, so far.

You'll probably want to extract that title separately since it's within the span tag, which means you'll have to parse the site as text rather than HTML.
Split the HTML text by line feed (new line).
Convert to a table.
Filter to get just the rows containing "hourAgo".
Extract the date between the quotes.
let
Source = Web.BrowserContents("https://www.reclameaqui.com.br/empresa/nestle/lista-reclamacoes/"),
#"Split Text" = Text.Split(Source, "#(lf)"),
#"Converted to Table" = Table.FromList(#"Split Text", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Filtered Rows" = Table.SelectRows(#"Converted to Table", each Text.Contains([Column1], "hourAgo")),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Filtered Rows", {{"Column1", each Text.BetweenDelimiters(_, "<span class=""hourAgo ng-binding"" title=""", """>"), type text}})
in
#"Extracted Text Between Delimiters"
You can also change this slightly to include one of the other columns so you can merge back with your original table:
let
Source = Web.BrowserContents("https://www.reclameaqui.com.br/empresa/nestle/lista-reclamacoes/"),
#"Split Text" = Text.Split(Source, "#(lf)"),
#"Converted to Table" = Table.FromList(#"Split Text", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Filtered Rows" = Table.SelectRows(#"Converted to Table", each Text.Contains([Column1], "hourAgo")),
#"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Column1", Splitter.SplitTextByDelimiter("</span>", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11"}),
#"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter",{"Column1.4", "Column1.7"}),
#"Extracted Text After Delimiter" = Table.TransformColumns(#"Removed Other Columns", {{"Column1.4", each Text.BetweenDelimiters(_, "title=", ">"), type text}, {"Column1.7", each Text.Trim(Text.AfterDelimiter(_, ">")), type text}})
in
#"Extracted Text After Delimiter"

Create list of date ranges from a list of dates using power query

Please help me create a function in Power Query.
At one of the steps of the query, as a result, I get a list of dates. Some go sequentially, some separately. The quantity is not fixed.
Example (MM.DD.YYYY):
{01/01/2019,
01/02/2019,
01/03/2019,
01/05/2019,
01/06/2019,
01/08/2019}
I need to determine all intervals of consecutive dates and reflect the list of such intervals. The interval is set by the start and end dates. If there is one continuous date, then it is the beginning and the end.
An example from the previous data:
{{01/01/2019, 01/03/2019},
{01/05/2019, 01/06/2019},
{01/08/2019, 01/08/2019}}
Please help me write a function to solve this problem.
In my data, there are about 10,000 lines, each of which has a list attached up to 365 days. It is desirable that the function works quickly.
It feels like list.generate can help, but I don't understand this function very well.

This function, which I called Parse Dates, should do it:
(dateList) =>
let
#"Converted to Table" = Table.FromList(dateList, Splitter.SplitByNothing(), {"Dates"}, null, ExtraValues.Error),
#"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Dates", type date}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Start" = Table.AddColumn(#"Added Index", "Start", each try if #"Added Index"{[Index]-1}[Dates] = Date.AddDays([Dates],-1) then null else [Dates] otherwise [Dates]),
#"Added End" = Table.AddColumn(#"Added Start", "End", each try if #"Added Start"{[Index]+1}[Dates] = Date.AddDays([Dates],1) then null else [Dates] otherwise [Dates]),
#"Added Custom1" = Table.AddColumn(#"Added End", "Group", each "Group"),
#"Grouped Rows" = Table.Group(#"Added Custom1", {"Group"}, {{"Start", each List.RemoveNulls([Start]), type anynonnull}, {"End", each List.RemoveNulls([End]), type anynonnull}}),
#"Added Custom2" = Table.AddColumn(#"Grouped Rows", "Tabled", each Table.FromColumns({[Start],[End]},{"Start","End"})),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom2",{"Tabled"}),
#"Expanded Tabled" = Table.ExpandTableColumn(#"Removed Other Columns", "Tabled", {"Start", "End"}, {"Start", "End"}),
#"Added Custom3" = Table.AddColumn(#"Expanded Tabled", "Custom", each List.Dates([Start],Number.From([End]-[Start])+1,#duration(1,0,0,0))),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom3",{"Custom"})
in
#"Removed Other Columns1"
I invoked it with this:
let
Source = #"Parse Dates"(#"Dates List")
in
Source
...against this list, which I called Dates List:
...to get this result:

I managed to figure out how to use the List.Generate function to solve this problem.
This function works a little faster for me.
I called it fn_ListOfDatesToDateRanges.
To invoke it, you must pass a column in each row of which there is a list of dates.
Information from the KenR blog helped me with development.
To compare performance, I used an array with about 250 thousand lines. The increase in speed was 45 seconds versus 1 minute ~ (-33%)
Test file with used functions is here
(Dates)=>
let
InputData = List.Transform(List.Sort(Dates,Order.Ascending), each DateTime.Date(DateTime.From(_, "en-US"))),
DateRangesGen = List.Generate(
()=> [Date=null, Counter=0],
each [Counter]<=List.Count(InputData),
each [
Date =
let
CurrentRowDate = InputData{[Counter]},
PreviousRowDate = try InputData{[Counter]-1} otherwise null,
NextRowDate = try InputData{[Counter]+1} otherwise null,
MyDate = [Start_Date=
(if PreviousRowDate = null then CurrentRowDate else
if CurrentRowDate = Date.AddDays(Replacer.ReplaceValue(PreviousRowDate,null,0),1) then null else CurrentRowDate),
End_Date=(
if NextRowDate = null then CurrentRowDate else
if CurrentRowDate=Date.AddDays(Replacer.ReplaceValue(NextRowDate,null,0),-1) then null else CurrentRowDate)
]
in
MyDate,
Counter=[Counter]+1],
each [Date]),
DateRanges = Table.ExpandTableColumn(Table.SelectColumns(Table.AddColumn(Table.Group(Table.AddColumn(Table.ExpandRecordColumn(Table.FromList(DateRangesGen, Splitter.SplitByNothing(), null, null, ExtraValues.Error), "Column1", {"Start_Date", "End_Date"}, {"Start_Date", "End_Date"}), "Group", each "Group"), "Group", {{"Start_Date", each List.RemoveNulls([Start_Date]), type anynonnull}, {"End_Date", each List.RemoveNulls([End_Date]), type anynonnull}}), "Tabled", each Table.FromColumns({[Start_Date],[End_Date]},{"Start_Date","End_Date"})),{"Tabled"}), "Tabled", {"Start_Date", "End_Date"}, {"Start_Date", "End_Date"})
in
DateRanges

Power Query List To Table

I have trouble transforming a list to table.
So this:
{"123","1.1","321","12","345","345"}
should be transformed to this
{{"123","1.1"},{"321","12"},{"345","345"}}
even better if there where a function that you could easily transform also to this
{{"123","1.1","321"},{"12","345","345"}}

Might not be the best way:
let
ListToTable = (sourceList as list, columnCount as number) as table =>
let
#"Converted to Table" = Table.FromList(sourceList , Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Added Index" = Table.AddIndexColumn(#"Converted to Table", "Index", 0, 1),
#"Integer-Divided Column" = Table.TransformColumns(#"Added Index", {{"Index", each Number.IntegerDivide(_, columnCount), Int64.Type}}),
#"Grouped Rows" = Table.FromRecords(List.Transform(Table.Group(#"Integer-Divided Column", {"Index"}, {{"Rows", each Table.SelectColumns(_, "Column1"), type table}})[Rows], (rowTable) => Table.First(Table.Transpose(rowTable))))
in
#"Grouped Rows"
in
ListToTable({1,2,3,4,5,6}, 3)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Concatenate ESRI JSON geometry rings using Powerquery in Power Bi - powerbi

Related

Power Query data load: large dataset causes 'expression.error column not found'

Grouped running total with Power Query M

Is there a way to extract the "title" attribute content from a span using Power Query?

Create list of date ranges from a list of dates using power query

Power Query List To Table

Categories

Resources