Grouped running total with Power Query M - powerbi

This is how my table looks like (1.7 million rows):
I'm trying to build a running total per customer ID and date.
This is easy to express using DAX, but unfortunately I don't have enough memory on my machine (16GB RAM).
So, I'm trying to find an alternative with Power Query M using buffered tables, etc. but that is too complicated for me.
Can anyone help? Thank you so much in advance!
EDIT: After sorting by Date and CustomerID, added index and added a custom column with:
= Table.AddColumn(#"Added Index", "Personalizado", each (i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]))
I get the following:
EDIT2:
The whole code:
let
Origem = dataset,
#"Linhas Agrupadas" = Table.Group(Origem, {"Date", "CustomerID"}, {{"Sales", each List.Sum([Sales]), type nullable number}}),
#"Linhas Ordenadas" = Table.Sort(#"Linhas Agrupadas",{{"Date", Order.Ascending}, {"CustomerID", Order.Ascending}}),
#"Linhas Filtradas" = Table.SelectRows(#"Linhas Ordenadas", each [Sales] <> 0),
#"Added Index" = Table.AddIndexColumn(#"Linhas Filtradas", "Index", 0, 1, Int64.Type),
#"Personalizado Adicionado" = Table.AddColumn(#"Added Index","CumSum",(i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]), type number )
in
#"Personalizado Adicionado"

Method1
Sort your data to start with, perhaps on the date column and CustomerID column. However it appears on screen is the row order it is going to accumulate the total
Add column .. index column...
Add column .. custom column with formula
= (i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales])
Right click index column and remove it
Likely adding a Table.Buffer() around the index step will help speed things up
Sample full code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Sorted Rows" = Table.Sort(Source,{{"CustomerID", Order.Ascending}, {"Date", Order.Ascending}}),
#"Added Index" = Table.Buffer(Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1)),
#"Added Custom" = Table.AddColumn(#"Added Index","CumSum",(i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]), type number ),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"})
in #"Removed Columns"
Method 2:
Create function fn_cum_total
(Input) =>
let withindex = Table.AddIndexColumn(Input, "Index", 1, 1),
cum = Table.AddColumn(withindex, "Total",each List.Sum(List.Range(withindex[Sales],0,[Index])))[Total]
in cum
Create query that uses that function to add cumulative totals to Sales column after grouping on CustomerID
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Sorted Rows" = Table.Buffer(Table.Sort(Source,{{"CustomerID", Order.Ascending}, {"Date", Order.Ascending}})),
Running_Total = Table.Group(#"Sorted Rows",{"CustomerID"},{{"Data",
(Input as table) as table => let zz = fn_cum_total(Input),
result = Table.FromColumns(Table.ToColumns(Input)&{zz}, Value.Type(Table.AddColumn(Input, "total", each null, type number))) in result, type table}} ),
#"Expanded Data" = Table.ExpandTableColumn(Running_Total, "Data", {"Date", "Sales", "total"}, {"Date", "Sales", "total"})
in #"Expanded Data"
I cannot take credit for method 2, borrowed long ago, but do not recall source

Related

Promoting Headers from Rows to Columns

I am new to data analysis and I'm wondering if I can get pointers for what I am facing at the moment.
I have an ICS calendar that I am trying to export into a spreadsheet. However, the data I recieve is organised as follows:
Data
Event: NAME XXX
Date: xx xx xx
Location: NOWHERE
URL: www.hi.com
Event: NAME YYY
Date: yy yy yy
Location: SOMEHWERE
URL: www.hello.com
... and so on
I need to be able promote the text before the : delimiter on every four rows as headers. so that my data looks like this:
Event
Date
Location
URL
NAME X
xx xx xx
SOMEHWERE
hello.com
NAME Y
xx xx xx
NOWHERE
bye.com
I can use SQL or Python or data visualisation software such as PowerBI, alternatively, good ol' Excel works fine.
I tried other tools and workarounds such as uploading the ICS calendar into my Outlook calendar and then exporting the calendar. This worked fine but it is a work around.
I would like to be able to load the information via the ICS link directly into a CSV/Excel because I am using the information to populate a PowerBI Dashboard.
This
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wci1LzSuxUvBz9HVViIiIUIrViVZySSxJtVKoqIAgsJBPfnJiSWZ+HlClf7iHa5ArWDQ0yMdKoby8XC8jUy85PxcshmxgZGQkkoGVlRCEZmCwv6+rRzimkak5OfkQU2MB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Data = _t]),
#"Split Column by Delimiter" = Table.SplitColumn(
Source, "Data", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"Data.1", "Data.2"}),
#"Added Index" = Table.AddIndexColumn(
#"Split Column by Delimiter", "Index", 1, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(
#"Added Index", "Custom", each if Text.Contains([Data.1],"Event") then [Index] else null),
#"Filled Down" = Table.FillDown(
#"Added Custom",{"Custom"}),
#"Removed Columns" = Table.RemoveColumns(
#"Filled Down",{"Index"}),
#"Pivoted Column" = Table.Pivot(
#"Removed Columns", List.Distinct(#"Removed Columns"[Data.1]), "Data.1", "Data.2"),
#"Removed Columns1" = Table.RemoveColumns(
#"Pivoted Column",{"Custom"})
in
#"Removed Columns1"
is how to get from here:
to there:
In powerquery, try this on your sample data set provided above:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Filtered Rows1" = Table.SelectRows(Source, each [Column1] <> null),
#"Added Index" = Table.AddIndexColumn(#"Filtered Rows1", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each if Text.Contains([Column1],"BEGIN:VEVENT") then [Index] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Custom] <> null)),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Filtered Rows", {"Custom"}),
#"Replaced Value" = Table.ReplaceValue(#"Removed Errors","ORGANIZER;CN=","ORGANIZER/CN:",Replacer.ReplaceText,{"Column1"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value", "Column1", Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv, false), {"Column1.1", "Column1.2"}),
#"Removed Columns" = Table.RemoveColumns(#"Split Column by Delimiter",{"Index"}),
removeHTML1 = Table.TransformColumns(#"Removed Columns",{{"Column1.2",each try Text.Combine(List.Select(List.Alternate(Text.SplitAny(_,"<>"),1,1,1), each _<>""), "") otherwise null, type text}}),
#"Pivoted Column" = Table.Pivot(removeHTML1, List.Distinct(removeHTML1[Column1.1]), "Column1.1", "Column1.2"),
extractEmail = Table.AddColumn(#"Pivoted Column", "email", each List.Last(Text.Split([#"ORGANIZER/CN"],":")))
in extractEmail

Power Query. How can I dynamically transform a list of records to columns?

I'm returning some JSON data from an API. There's an ID, a bunch of other fields (not included in this), and most importantly a List. The List contains a number of records (the same number and structure for each row)
I'm trying to map the records to columns rather than having to "Expand to New Rows". Each record in the list contains 3 fields (ID, Value & Text).
This is the current structure:
I would like to transform the list of records to look something like this:
The number of records within the List can change. So today I have 2 records in a List for each ID, but tomorrow there could be 4 records. So I need something dynamic that will add a new column into the table based on each record available in the list.
Any help would be much appreciated
See if this works. Built in sample.
let Source = Table.AddColumn(#table({"ID"}, {{"111"}, {"222"},{"333"}}), "Custom", each List.Repeat({[ID="Field", Value="BOB",Text="ASDSD"]},3)),
#"Added Custom2" = Table.AddColumn(Source, "Custom.2", each Table.FromRecords([Custom])),
#"Added Index" = Table.AddIndexColumn(#"Added Custom2", "Index", 0, 1, Int64.Type),
Names= Table.ColumnNames ( Table.Combine ( #"Added Index"[Custom.2] ) ),
NewNames=List.Transform(Names, each "extracted"&_),
Expand= Table.ExpandTableColumn ( #"Added Index", "Custom.2", Names,NewNames ),
#"Removed Columns" = Table.RemoveColumns(Expand,{"Custom"}),
Unpivot=Table.Unpivot(#"Removed Columns",NewNames,"attribute","value"),
#"Grouped Rows" = Table.Group(Unpivot, {"Index"}, {{"data", each
Table.TransformColumns(
Table.AddIndexColumn(_, "Index2", 1, 1, Int64.Type)
,{{"Index2", each Number.RoundUp(_/ List.Count(NewNames),0), type number}})
, type table }}),
#"Removed Other Columns" = Table.SelectColumns(#"Grouped Rows",{"data"}),
ColumnsToExpand = List.Distinct(List.Combine(List.Transform(Table.Column(#"Removed Other Columns", "data"), each if _ is table then Table.ColumnNames(_) else {}))),
#"Expanded Part2" = Table.ExpandTableColumn(#"Removed Other Columns", "data",ColumnsToExpand ,ColumnsToExpand ),
#"Merged Columns1" = Table.CombineColumns(Table.TransformColumnTypes(#"Expanded Part2", {{"Index2", type text}}, "en-US"),{"Index2", "attribute"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
#"Pivoted Column" = Table.Pivot(#"Merged Columns1", List.Distinct(#"Merged Columns1"[Merged]), "Merged", "value")
in #"Pivoted Column"

Multiplying a current day's value by the next day's value

I have a date_column, an X_column and a sales_column.
01/01/2022 | 3 | 50
02/01/2022 | 4 | 10
03/01/2022 | 1 | 5
and I want to multiply:
50 * 4 = 200
10*1 = 10
...
Powerquery ...
If the dates are always consecutive, already sorted by date then most understandable way is :
Add column, index column
Add column, custom column with formula
= #"Added Index"{[Index]+1}[Column1]
Click select the three numerical columns, transform, data type decimal
Add column, custom column with formula
=[Column2]*[Custom]
That will multiply them on each row. The bottom item will return an error that you could replace with something (right-click column, replace error, and put something else in there in instead)
sample code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each #"Added Index"{[Index]+1}[Column1]),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Column2", type number}, {"Index", type number}, {"Custom", type number}}),
#"Added Custom1" = Table.AddColumn(#"Changed Type1", "Custom.1", each [Column2]*[Custom])
in #"Added Custom1"
A more advanced way will do the calculation regardless of the sort order of the data, and will return an error if there is no match for the next day (you could right-click replace error, and put something else in there in instead). Assumes columns in question are called date, Column1 and Column2
add column ... custom column ... with code
(i)=>Table.SelectRows(Source, each [date]=Date.AddDays(i[date],1))[Column1]{0}
then follow steps above for [Column2]*[Custom]
sample full code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source,"Offset",(i)=>Table.SelectRows(Source, each [date]=Date.AddDays(i[date],1))[Column1]{0}),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom", each [Column2]*[Offset])
in #"Added Custom1"

Merging multiple rows based on criteria into 1 in power query

could you please assist solving the following tasks:
F.e. I have data set:
What i need - to create a task with description, which discounts need to be check. It should be in following format though:
SKU within same brand with same discount depth should be merged into 1 row - Check Discount 10% for brand 1 for SKU's: cream & oil.
While others should remain as same rows as they have different discounts within brand:
Check Discount 20% for brand 2 for SKU detergent
Check Discount 15% for brand 2 for SKU tabs.
There is more levels of data, f.e. the task should be within same outlet (if there is x > 1 outlets, task will be multiplied by x according to amount of outlets). But I guess it should be easy further on if I get the method how to do the mentioned above task.
Should be pretty similar to the previous one, but I might be wrong
Monitor & Catalogue columns basically describe which rows can be merged. So the output out of this table should be 2 rows:
Check positioning of 1-oil and 2-tabs on the monitor
Check positioning of 1-cream and 2-detergent on the catalogue
There can be multiple levels of aggregation, i.e. on top of rows with 1's, there can be rows with 2's - meaning they should be merged in separate task as well. 0 in all cases means - don't take.
I understand it might be a little bit overcomplicated, but i'm looking to speed up this process in Power Query as it's currently being done with VBA analyzing each row and finding match positions.
Here's the desired result with input data:
Everything further is simple. I just eliminate brand-sku and group by task.
Thank you!
You guys are making it more complicated than it needs to be. The key here is that you can aggregate text columns when using Group By.
Here's how I'd do the first one:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUouSk3MBdKGBqpKsToQsfzMHCQRIyA7JbUktSg9Na8EyDZCEi9JTCoGKTUFCsUCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Brand = _t, SKU = _t, Discount = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Brand", Int64.Type}, {"SKU", type text}, {"Discount", Percentage.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Brand", "Discount"}, {{"SKU", each Text.Combine([SKU],", "), type nullable text}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Result task", each "Check discount " & Number.ToText([Discount], "P0") & " for brand " & Number.ToText([Brand]) & " for SKU: " & [SKU], type text)
in
#"Added Custom"
Result:
Note that I've grouped on Brand and Discount and aggregated the SKU column but combining each row into a list separated by ", " using Text.Combine([SKU],", ") as the aggregating function instead of any of the default options you can choose. I usually pick Max as the aggregation and then replace that function, i.e. List.Max([SKU]), in the formula for that step.
Once you've done that grouping, you just need to string the pieces together in a custom column.
The second one can be done similarly with the added step of concatenating Brand and SKU into one column before grouping.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUouSk3MBdIGQGyoFKsDEc3PzAHzQeIgMSMgKyW1JLUoPTWvBEU1SKYkMakYoTwWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Brand = _t, SKU = _t, Monitor = _t, Catalogue = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Brand", Int64.Type}, {"SKU", type text}, {"Monitor", Int64.Type}, {"Catalogue", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "BrandSKU", each Number.ToText([Brand]) & "-" & [SKU], type text),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Monitor", "Catalogue"}, {{"BrandSKU", each Text.Combine([BrandSKU], ", "), type text}}),
#"Added Custom1" = Table.AddColumn(#"Grouped Rows", "Result task", each "Check placement of " & [BrandSKU] & " on " & (if [Catalogue] = 1 then "Catalogue" else "Monitor"), type text)
in
#"Added Custom1"
Here's the first one - note that the format you requested in the first one (data entered into separate rows within the same cell using alt+enter) isn't supported in powerquery, so I separated the data with commas instead.
Instructions
Add column>add index column
Highlight index columns>transform>pivot>sku as values>advanced options>don't aggregate
Highlight all of the columns to the right>transform>merge columns (choose a separator if you want one, I chose commas)
Transform>Replace ,, with , (may have to do a few times)
Change Brand to text, Discount to %
Add column>custom column formula = "Check discount " & Number.ToText([Discount]*100) & "% for brand " & [Brand] & " for SKU " & Text.Trim([Merged],",")
Before/After
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Brand", Int64.Type}, {"SKU", type text}, {"Discount", type number}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Added Index", {{"Index", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Added Index", {{"Index", type text}}, "en-US")[Index]), "Index", "SKU"),
#"Merged Columns" = Table.CombineColumns(#"Pivoted Column",{"1", "2", "3", "4", "5"},Combiner.CombineTextByDelimiter(",", QuoteStyle.None),"Merged"),
#"Changed Type1" = Table.TransformColumnTypes(#"Merged Columns",{{"Brand", type text}, {"Discount", Percentage.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Custom", each "Check discount " & Number.ToText([Discount]*100) & "% for brand " & [Brand] & " for SKU " & Text.Trim([Merged],","))
in
#"Added Custom"
2nd Example Instructions
Note: For this one it is easies to do Monitor and Separator separately. Just filter for a different one each time.
Add column>add index column
Highlight index columns>transform>pivot>sku as values>advanced options>don't aggregate
Filter for Monitor =1
Delete Monitor & Catalogue columns
Merge remaining columns, use - as separator
Transpose
Merge columns using , as separator
Find and replace -- with - (may have to do a couple times)
Custom column> Use the formula ="Check place ment of " & Text.Trim([Merged]) & " on Monitor"
2nd Example Before/After
2nd Example M-Code
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Brand", Int64.Type}, {"SKU", type text}, {"Monitor", Int64.Type}, {"Catalogue", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Added Index", {{"Index", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Added Index", {{"Index", type text}}, "en-US")[Index]), "Index", "SKU"),
#"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each ([Monitor] = 1)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Monitor", "Catalogue"}),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Removed Columns", {{"Brand", type text}}, "en-US"),{"Brand", "0", "1", "2", "3"},Combiner.CombineTextByDelimiter("-", QuoteStyle.None),"Merged"),
#"Transposed Table" = Table.Transpose(#"Merged Columns"),
#"Merged Columns1" = Table.CombineColumns(#"Transposed Table",{"Column1", "Column2"},Combiner.CombineTextByDelimiter(", ", QuoteStyle.None),"Merged"),
#"Replaced Value" = Table.ReplaceValue(#"Merged Columns1","--","-",Replacer.ReplaceText,{"Merged"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","--","-",Replacer.ReplaceText,{"Merged"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value1", "Custom", each "Check place ment of " & Text.Trim([Merged]) & " on Monitor")
in
#"Added Custom"
Hopefully that gets you started on how to apply PQ to your data! You may have to adjust slightly if your data sets vary.
Thanks for your advices, Hooded 0ne. Gave me the right direction.
I've done only 1st part though, here are some adjustments I made:
Added $ to SKU to find position later to be replaced with "," - now I can clear the delimiters from merge in 1st step via replace ";" with blank, replace "$" with "," and Text.End or Trim the first "," in the row
Added Select Columns to the step after "Pivot columns". There's dynamic list of columns, so I can't hardcode "1,2,3,4,5" like you did
Here's my final code for p1:
#"Add $" = Table.AddColumn(#"Filtered Rows", "SKU_SYMBOL", each "$"&[ROI_LKA_BASE.SKU]),
#"Add Index" = Table.AddIndexColumn(#"Add $", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Add Index", "TextIndex", each "TASK_"&Number.ToText([Index])),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Index", "Shelf start", "Shelf End", "KAM", "Вид Инф. АУ", "Место размещ. АУ", "Адрес", "Attribute", "Value", "ROI_LKA_BASE.Мониторы", "ROI_LKA_BASE.Каталог", "ROI_LKA_BASE.Confirmed with customer", "ROI_LKA_BASE.Confirmed plan", "ROI_LKA_BASE.SKU"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns2", List.Distinct(#"Removed Columns2"[TextIndex]), "TextIndex", "SKU_SYMBOL"),
ColumnsToSelect = List.Select(Table.ColumnNames(#"Pivoted Column"),each Text.Contains(_,"TASK")),
#"Select Pivoted Columns" = Table.SelectColumns(#"Pivoted Column",ColumnsToSelect),
#"Merged Columns" = Table.CombineColumns(#"Pivoted Column",Table.ColumnNames(#"Select Pivoted Columns"),Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"PIVOT_MERGED"),
#"Replaced Value" = Table.ReplaceValue(#"Merged Columns",";","",Replacer.ReplaceText,{"PIVOT_MERGED"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","$",",",Replacer.ReplaceText,{"PIVOT_MERGED"}),
#"Added Custom1" = Table.AddColumn(#"Replaced Value1", "TASK", each "Проверить скидку на " & [ROI_LKA_BASE.Бренд] & ": "
& Text.End([PIVOT_MERGED],Text.Length([PIVOT_MERGED])-1)),
#"Grouped Rows" = Table.Group(#"Added Custom1", {"Promo ID", "promotool_code", "ROI_LKA_BASE.Начало акции", "ROI_LKA_BASE.Конец акции", "ROI_LKA_BASE.Chain code", "ROI_LKA_BASE.Описание промо", "ROI_LKA_BASE.Сеть", "TASK"}, {{"Count", each Table.RowCount(_), Int64.Type}}
Will return later with my solution on 2nd part.
Thanks again Hooded 0ne, big help.

Create list of date ranges from a list of dates using power query

Please help me create a function in Power Query.
At one of the steps of the query, as a result, I get a list of dates. Some go sequentially, some separately. The quantity is not fixed.
Example (MM.DD.YYYY):
{01/01/2019,
01/02/2019,
01/03/2019,
01/05/2019,
01/06/2019,
01/08/2019}
I need to determine all intervals of consecutive dates and reflect the list of such intervals. The interval is set by the start and end dates. If there is one continuous date, then it is the beginning and the end.
An example from the previous data:
{{01/01/2019, 01/03/2019},
{01/05/2019, 01/06/2019},
{01/08/2019, 01/08/2019}}
Please help me write a function to solve this problem.
In my data, there are about 10,000 lines, each of which has a list attached up to 365 days. It is desirable that the function works quickly.
It feels like list.generate can help, but I don't understand this function very well.
This function, which I called Parse Dates, should do it:
(dateList) =>
let
#"Converted to Table" = Table.FromList(dateList, Splitter.SplitByNothing(), {"Dates"}, null, ExtraValues.Error),
#"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Dates", type date}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Start" = Table.AddColumn(#"Added Index", "Start", each try if #"Added Index"{[Index]-1}[Dates] = Date.AddDays([Dates],-1) then null else [Dates] otherwise [Dates]),
#"Added End" = Table.AddColumn(#"Added Start", "End", each try if #"Added Start"{[Index]+1}[Dates] = Date.AddDays([Dates],1) then null else [Dates] otherwise [Dates]),
#"Added Custom1" = Table.AddColumn(#"Added End", "Group", each "Group"),
#"Grouped Rows" = Table.Group(#"Added Custom1", {"Group"}, {{"Start", each List.RemoveNulls([Start]), type anynonnull}, {"End", each List.RemoveNulls([End]), type anynonnull}}),
#"Added Custom2" = Table.AddColumn(#"Grouped Rows", "Tabled", each Table.FromColumns({[Start],[End]},{"Start","End"})),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom2",{"Tabled"}),
#"Expanded Tabled" = Table.ExpandTableColumn(#"Removed Other Columns", "Tabled", {"Start", "End"}, {"Start", "End"}),
#"Added Custom3" = Table.AddColumn(#"Expanded Tabled", "Custom", each List.Dates([Start],Number.From([End]-[Start])+1,#duration(1,0,0,0))),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom3",{"Custom"})
in
#"Removed Other Columns1"
I invoked it with this:
let
Source = #"Parse Dates"(#"Dates List")
in
Source
...against this list, which I called Dates List:
...to get this result:
I managed to figure out how to use the List.Generate function to solve this problem.
This function works a little faster for me.
I called it fn_ListOfDatesToDateRanges.
To invoke it, you must pass a column in each row of which there is a list of dates.
Information from the KenR blog helped me with development.
To compare performance, I used an array with about 250 thousand lines. The increase in speed was 45 seconds versus 1 minute ~ (-33%)
Test file with used functions is here
(Dates)=>
let
InputData = List.Transform(List.Sort(Dates,Order.Ascending), each DateTime.Date(DateTime.From(_, "en-US"))),
DateRangesGen = List.Generate(
()=> [Date=null, Counter=0],
each [Counter]<=List.Count(InputData),
each [
Date =
let
CurrentRowDate = InputData{[Counter]},
PreviousRowDate = try InputData{[Counter]-1} otherwise null,
NextRowDate = try InputData{[Counter]+1} otherwise null,
MyDate = [Start_Date=
(if PreviousRowDate = null then CurrentRowDate else
if CurrentRowDate = Date.AddDays(Replacer.ReplaceValue(PreviousRowDate,null,0),1) then null else CurrentRowDate),
End_Date=(
if NextRowDate = null then CurrentRowDate else
if CurrentRowDate=Date.AddDays(Replacer.ReplaceValue(NextRowDate,null,0),-1) then null else CurrentRowDate)
]
in
MyDate,
Counter=[Counter]+1],
each [Date]),
DateRanges = Table.ExpandTableColumn(Table.SelectColumns(Table.AddColumn(Table.Group(Table.AddColumn(Table.ExpandRecordColumn(Table.FromList(DateRangesGen, Splitter.SplitByNothing(), null, null, ExtraValues.Error), "Column1", {"Start_Date", "End_Date"}, {"Start_Date", "End_Date"}), "Group", each "Group"), "Group", {{"Start_Date", each List.RemoveNulls([Start_Date]), type anynonnull}, {"End_Date", each List.RemoveNulls([End_Date]), type anynonnull}}), "Tabled", each Table.FromColumns({[Start_Date],[End_Date]},{"Start_Date","End_Date"})),{"Tabled"}), "Tabled", {"Start_Date", "End_Date"}, {"Start_Date", "End_Date"})
in
DateRanges