Related
I am new to data analysis and I'm wondering if I can get pointers for what I am facing at the moment.
I have an ICS calendar that I am trying to export into a spreadsheet. However, the data I recieve is organised as follows:
Data
Event: NAME XXX
Date: xx xx xx
Location: NOWHERE
URL: www.hi.com
Event: NAME YYY
Date: yy yy yy
Location: SOMEHWERE
URL: www.hello.com
... and so on
I need to be able promote the text before the : delimiter on every four rows as headers. so that my data looks like this:
Event
Date
Location
URL
NAME X
xx xx xx
SOMEHWERE
hello.com
NAME Y
xx xx xx
NOWHERE
bye.com
I can use SQL or Python or data visualisation software such as PowerBI, alternatively, good ol' Excel works fine.
I tried other tools and workarounds such as uploading the ICS calendar into my Outlook calendar and then exporting the calendar. This worked fine but it is a work around.
I would like to be able to load the information via the ICS link directly into a CSV/Excel because I am using the information to populate a PowerBI Dashboard.
This
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wci1LzSuxUvBz9HVViIiIUIrViVZySSxJtVKoqIAgsJBPfnJiSWZ+HlClf7iHa5ArWDQ0yMdKoby8XC8jUy85PxcshmxgZGQkkoGVlRCEZmCwv6+rRzimkak5OfkQU2MB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Data = _t]),
#"Split Column by Delimiter" = Table.SplitColumn(
Source, "Data", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"Data.1", "Data.2"}),
#"Added Index" = Table.AddIndexColumn(
#"Split Column by Delimiter", "Index", 1, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(
#"Added Index", "Custom", each if Text.Contains([Data.1],"Event") then [Index] else null),
#"Filled Down" = Table.FillDown(
#"Added Custom",{"Custom"}),
#"Removed Columns" = Table.RemoveColumns(
#"Filled Down",{"Index"}),
#"Pivoted Column" = Table.Pivot(
#"Removed Columns", List.Distinct(#"Removed Columns"[Data.1]), "Data.1", "Data.2"),
#"Removed Columns1" = Table.RemoveColumns(
#"Pivoted Column",{"Custom"})
in
#"Removed Columns1"
is how to get from here:
to there:
In powerquery, try this on your sample data set provided above:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Filtered Rows1" = Table.SelectRows(Source, each [Column1] <> null),
#"Added Index" = Table.AddIndexColumn(#"Filtered Rows1", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each if Text.Contains([Column1],"BEGIN:VEVENT") then [Index] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Custom] <> null)),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Filtered Rows", {"Custom"}),
#"Replaced Value" = Table.ReplaceValue(#"Removed Errors","ORGANIZER;CN=","ORGANIZER/CN:",Replacer.ReplaceText,{"Column1"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value", "Column1", Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv, false), {"Column1.1", "Column1.2"}),
#"Removed Columns" = Table.RemoveColumns(#"Split Column by Delimiter",{"Index"}),
removeHTML1 = Table.TransformColumns(#"Removed Columns",{{"Column1.2",each try Text.Combine(List.Select(List.Alternate(Text.SplitAny(_,"<>"),1,1,1), each _<>""), "") otherwise null, type text}}),
#"Pivoted Column" = Table.Pivot(removeHTML1, List.Distinct(removeHTML1[Column1.1]), "Column1.1", "Column1.2"),
extractEmail = Table.AddColumn(#"Pivoted Column", "email", each List.Last(Text.Split([#"ORGANIZER/CN"],":")))
in extractEmail
I'm returning some JSON data from an API. There's an ID, a bunch of other fields (not included in this), and most importantly a List. The List contains a number of records (the same number and structure for each row)
I'm trying to map the records to columns rather than having to "Expand to New Rows". Each record in the list contains 3 fields (ID, Value & Text).
This is the current structure:
I would like to transform the list of records to look something like this:
The number of records within the List can change. So today I have 2 records in a List for each ID, but tomorrow there could be 4 records. So I need something dynamic that will add a new column into the table based on each record available in the list.
Any help would be much appreciated
See if this works. Built in sample.
let Source = Table.AddColumn(#table({"ID"}, {{"111"}, {"222"},{"333"}}), "Custom", each List.Repeat({[ID="Field", Value="BOB",Text="ASDSD"]},3)),
#"Added Custom2" = Table.AddColumn(Source, "Custom.2", each Table.FromRecords([Custom])),
#"Added Index" = Table.AddIndexColumn(#"Added Custom2", "Index", 0, 1, Int64.Type),
Names= Table.ColumnNames ( Table.Combine ( #"Added Index"[Custom.2] ) ),
NewNames=List.Transform(Names, each "extracted"&_),
Expand= Table.ExpandTableColumn ( #"Added Index", "Custom.2", Names,NewNames ),
#"Removed Columns" = Table.RemoveColumns(Expand,{"Custom"}),
Unpivot=Table.Unpivot(#"Removed Columns",NewNames,"attribute","value"),
#"Grouped Rows" = Table.Group(Unpivot, {"Index"}, {{"data", each
Table.TransformColumns(
Table.AddIndexColumn(_, "Index2", 1, 1, Int64.Type)
,{{"Index2", each Number.RoundUp(_/ List.Count(NewNames),0), type number}})
, type table }}),
#"Removed Other Columns" = Table.SelectColumns(#"Grouped Rows",{"data"}),
ColumnsToExpand = List.Distinct(List.Combine(List.Transform(Table.Column(#"Removed Other Columns", "data"), each if _ is table then Table.ColumnNames(_) else {}))),
#"Expanded Part2" = Table.ExpandTableColumn(#"Removed Other Columns", "data",ColumnsToExpand ,ColumnsToExpand ),
#"Merged Columns1" = Table.CombineColumns(Table.TransformColumnTypes(#"Expanded Part2", {{"Index2", type text}}, "en-US"),{"Index2", "attribute"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
#"Pivoted Column" = Table.Pivot(#"Merged Columns1", List.Distinct(#"Merged Columns1"[Merged]), "Merged", "value")
in #"Pivoted Column"
I have a data model with two tables sharing the same columns.
I have merged the tables using the prefixes "Old." and "New."
I'd like to add a calculated column for each column that shows if the values are different with the name like "Column_IsDifferent" and a boolean value of true or false.
I have already found out that you can add multiple columns by using List.Accumulate. But for some reason my code seems not to work as expected:
= List.Accumulate(List.Select(Table.ColumnNames(#"Extend joined table"), each Text.StartsWith(_, "New")), #"Extend joined table", (state, current) => Table.AddColumn(state, Text.RemoveRange(current, 0, 4) & "_IsDifferent", each Table.Column(state, current) <> Table.Column(state, "Old." & Text.RemoveRange(current, 0, 4)), type logical))
Basically, it takes forever to load data and I don't get an error message...
I suspect there is something wrong with this part:
Table.Column(state, current) <> Table.Column(state, "Old." & Text.RemoveRange(current, 0, 4))
You can try this in powerquery which adds a 3rd column to each 2 column pair showing if they match
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
#"Duplicated Column" = Table.DuplicateColumn(#"Unpivoted Other Columns", "Attribute", "Attribute - Copy"),
#"Split Column by Delimiter" = Table.SplitColumn(#"Duplicated Column", "Attribute - Copy", Splitter.SplitTextByDelimiter(".", QuoteStyle.Csv), {"A1", "A2"}),
#"Grouped Rows" = Table.Group(#"Split Column by Delimiter", {"Index", "A2"}, {{"data", each
let compare = if _{0}[Value] = _{1}[Value] then "match" else "nomatch"
in Table.InsertRows( _,1,{[Index = _{0}[Index], Attribute = "Delta."&_{0}[A2], Value=compare, A1="Delta", A2=_{0}[A2]]})
, type table }}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Attribute", "Value" }, {"Attribute", "Value"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded data",{"A2"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Value"),
#"Removed Columns1" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns1"
Just stumbled upon something, what do you think about this change:
each Table.Column(state, current) <> Table.Column(state, "Old." & Text.RemoveRange(current, 0, 4))
into:
each Record.Field(_, current) <> Record.Field(_, "Old." & Text.RemoveRange(current, 0, 4))
This is how my table looks like (1.7 million rows):
I'm trying to build a running total per customer ID and date.
This is easy to express using DAX, but unfortunately I don't have enough memory on my machine (16GB RAM).
So, I'm trying to find an alternative with Power Query M using buffered tables, etc. but that is too complicated for me.
Can anyone help? Thank you so much in advance!
EDIT: After sorting by Date and CustomerID, added index and added a custom column with:
= Table.AddColumn(#"Added Index", "Personalizado", each (i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]))
I get the following:
EDIT2:
The whole code:
let
Origem = dataset,
#"Linhas Agrupadas" = Table.Group(Origem, {"Date", "CustomerID"}, {{"Sales", each List.Sum([Sales]), type nullable number}}),
#"Linhas Ordenadas" = Table.Sort(#"Linhas Agrupadas",{{"Date", Order.Ascending}, {"CustomerID", Order.Ascending}}),
#"Linhas Filtradas" = Table.SelectRows(#"Linhas Ordenadas", each [Sales] <> 0),
#"Added Index" = Table.AddIndexColumn(#"Linhas Filtradas", "Index", 0, 1, Int64.Type),
#"Personalizado Adicionado" = Table.AddColumn(#"Added Index","CumSum",(i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]), type number )
in
#"Personalizado Adicionado"
Method1
Sort your data to start with, perhaps on the date column and CustomerID column. However it appears on screen is the row order it is going to accumulate the total
Add column .. index column...
Add column .. custom column with formula
= (i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales])
Right click index column and remove it
Likely adding a Table.Buffer() around the index step will help speed things up
Sample full code:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Sorted Rows" = Table.Sort(Source,{{"CustomerID", Order.Ascending}, {"Date", Order.Ascending}}),
#"Added Index" = Table.Buffer(Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1)),
#"Added Custom" = Table.AddColumn(#"Added Index","CumSum",(i)=>List.Sum(Table.SelectRows(#"Added Index", each [CustomerID]=i[CustomerID] and [Index]<=i[Index]) [Sales]), type number ),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"})
in #"Removed Columns"
Method 2:
Create function fn_cum_total
(Input) =>
let withindex = Table.AddIndexColumn(Input, "Index", 1, 1),
cum = Table.AddColumn(withindex, "Total",each List.Sum(List.Range(withindex[Sales],0,[Index])))[Total]
in cum
Create query that uses that function to add cumulative totals to Sales column after grouping on CustomerID
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Sorted Rows" = Table.Buffer(Table.Sort(Source,{{"CustomerID", Order.Ascending}, {"Date", Order.Ascending}})),
Running_Total = Table.Group(#"Sorted Rows",{"CustomerID"},{{"Data",
(Input as table) as table => let zz = fn_cum_total(Input),
result = Table.FromColumns(Table.ToColumns(Input)&{zz}, Value.Type(Table.AddColumn(Input, "total", each null, type number))) in result, type table}} ),
#"Expanded Data" = Table.ExpandTableColumn(Running_Total, "Data", {"Date", "Sales", "total"}, {"Date", "Sales", "total"})
in #"Expanded Data"
I cannot take credit for method 2, borrowed long ago, but do not recall source
Please help me create a function in Power Query.
At one of the steps of the query, as a result, I get a list of dates. Some go sequentially, some separately. The quantity is not fixed.
Example (MM.DD.YYYY):
{01/01/2019,
01/02/2019,
01/03/2019,
01/05/2019,
01/06/2019,
01/08/2019}
I need to determine all intervals of consecutive dates and reflect the list of such intervals. The interval is set by the start and end dates. If there is one continuous date, then it is the beginning and the end.
An example from the previous data:
{{01/01/2019, 01/03/2019},
{01/05/2019, 01/06/2019},
{01/08/2019, 01/08/2019}}
Please help me write a function to solve this problem.
In my data, there are about 10,000 lines, each of which has a list attached up to 365 days. It is desirable that the function works quickly.
It feels like list.generate can help, but I don't understand this function very well.
This function, which I called Parse Dates, should do it:
(dateList) =>
let
#"Converted to Table" = Table.FromList(dateList, Splitter.SplitByNothing(), {"Dates"}, null, ExtraValues.Error),
#"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Dates", type date}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Start" = Table.AddColumn(#"Added Index", "Start", each try if #"Added Index"{[Index]-1}[Dates] = Date.AddDays([Dates],-1) then null else [Dates] otherwise [Dates]),
#"Added End" = Table.AddColumn(#"Added Start", "End", each try if #"Added Start"{[Index]+1}[Dates] = Date.AddDays([Dates],1) then null else [Dates] otherwise [Dates]),
#"Added Custom1" = Table.AddColumn(#"Added End", "Group", each "Group"),
#"Grouped Rows" = Table.Group(#"Added Custom1", {"Group"}, {{"Start", each List.RemoveNulls([Start]), type anynonnull}, {"End", each List.RemoveNulls([End]), type anynonnull}}),
#"Added Custom2" = Table.AddColumn(#"Grouped Rows", "Tabled", each Table.FromColumns({[Start],[End]},{"Start","End"})),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom2",{"Tabled"}),
#"Expanded Tabled" = Table.ExpandTableColumn(#"Removed Other Columns", "Tabled", {"Start", "End"}, {"Start", "End"}),
#"Added Custom3" = Table.AddColumn(#"Expanded Tabled", "Custom", each List.Dates([Start],Number.From([End]-[Start])+1,#duration(1,0,0,0))),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom3",{"Custom"})
in
#"Removed Other Columns1"
I invoked it with this:
let
Source = #"Parse Dates"(#"Dates List")
in
Source
...against this list, which I called Dates List:
...to get this result:
I managed to figure out how to use the List.Generate function to solve this problem.
This function works a little faster for me.
I called it fn_ListOfDatesToDateRanges.
To invoke it, you must pass a column in each row of which there is a list of dates.
Information from the KenR blog helped me with development.
To compare performance, I used an array with about 250 thousand lines. The increase in speed was 45 seconds versus 1 minute ~ (-33%)
Test file with used functions is here
(Dates)=>
let
InputData = List.Transform(List.Sort(Dates,Order.Ascending), each DateTime.Date(DateTime.From(_, "en-US"))),
DateRangesGen = List.Generate(
()=> [Date=null, Counter=0],
each [Counter]<=List.Count(InputData),
each [
Date =
let
CurrentRowDate = InputData{[Counter]},
PreviousRowDate = try InputData{[Counter]-1} otherwise null,
NextRowDate = try InputData{[Counter]+1} otherwise null,
MyDate = [Start_Date=
(if PreviousRowDate = null then CurrentRowDate else
if CurrentRowDate = Date.AddDays(Replacer.ReplaceValue(PreviousRowDate,null,0),1) then null else CurrentRowDate),
End_Date=(
if NextRowDate = null then CurrentRowDate else
if CurrentRowDate=Date.AddDays(Replacer.ReplaceValue(NextRowDate,null,0),-1) then null else CurrentRowDate)
]
in
MyDate,
Counter=[Counter]+1],
each [Date]),
DateRanges = Table.ExpandTableColumn(Table.SelectColumns(Table.AddColumn(Table.Group(Table.AddColumn(Table.ExpandRecordColumn(Table.FromList(DateRangesGen, Splitter.SplitByNothing(), null, null, ExtraValues.Error), "Column1", {"Start_Date", "End_Date"}, {"Start_Date", "End_Date"}), "Group", each "Group"), "Group", {{"Start_Date", each List.RemoveNulls([Start_Date]), type anynonnull}, {"End_Date", each List.RemoveNulls([End_Date]), type anynonnull}}), "Tabled", each Table.FromColumns({[Start_Date],[End_Date]},{"Start_Date","End_Date"})),{"Tabled"}), "Tabled", {"Start_Date", "End_Date"}, {"Start_Date", "End_Date"})
in
DateRanges