Related
I am new to data analysis and I'm wondering if I can get pointers for what I am facing at the moment.
I have an ICS calendar that I am trying to export into a spreadsheet. However, the data I recieve is organised as follows:
Data
Event: NAME XXX
Date: xx xx xx
Location: NOWHERE
URL: www.hi.com
Event: NAME YYY
Date: yy yy yy
Location: SOMEHWERE
URL: www.hello.com
... and so on
I need to be able promote the text before the : delimiter on every four rows as headers. so that my data looks like this:
Event
Date
Location
URL
NAME X
xx xx xx
SOMEHWERE
hello.com
NAME Y
xx xx xx
NOWHERE
bye.com
I can use SQL or Python or data visualisation software such as PowerBI, alternatively, good ol' Excel works fine.
I tried other tools and workarounds such as uploading the ICS calendar into my Outlook calendar and then exporting the calendar. This worked fine but it is a work around.
I would like to be able to load the information via the ICS link directly into a CSV/Excel because I am using the information to populate a PowerBI Dashboard.
This
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wci1LzSuxUvBz9HVViIiIUIrViVZySSxJtVKoqIAgsJBPfnJiSWZ+HlClf7iHa5ArWDQ0yMdKoby8XC8jUy85PxcshmxgZGQkkoGVlRCEZmCwv6+rRzimkak5OfkQU2MB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Data = _t]),
#"Split Column by Delimiter" = Table.SplitColumn(
Source, "Data", Splitter.SplitTextByDelimiter(":", QuoteStyle.Csv), {"Data.1", "Data.2"}),
#"Added Index" = Table.AddIndexColumn(
#"Split Column by Delimiter", "Index", 1, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(
#"Added Index", "Custom", each if Text.Contains([Data.1],"Event") then [Index] else null),
#"Filled Down" = Table.FillDown(
#"Added Custom",{"Custom"}),
#"Removed Columns" = Table.RemoveColumns(
#"Filled Down",{"Index"}),
#"Pivoted Column" = Table.Pivot(
#"Removed Columns", List.Distinct(#"Removed Columns"[Data.1]), "Data.1", "Data.2"),
#"Removed Columns1" = Table.RemoveColumns(
#"Pivoted Column",{"Custom"})
in
#"Removed Columns1"
is how to get from here:
to there:
In powerquery, try this on your sample data set provided above:
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Filtered Rows1" = Table.SelectRows(Source, each [Column1] <> null),
#"Added Index" = Table.AddIndexColumn(#"Filtered Rows1", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each if Text.Contains([Column1],"BEGIN:VEVENT") then [Index] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Custom"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([Custom] <> null)),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Filtered Rows", {"Custom"}),
#"Replaced Value" = Table.ReplaceValue(#"Removed Errors","ORGANIZER;CN=","ORGANIZER/CN:",Replacer.ReplaceText,{"Column1"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value", "Column1", Splitter.SplitTextByEachDelimiter({":"}, QuoteStyle.Csv, false), {"Column1.1", "Column1.2"}),
#"Removed Columns" = Table.RemoveColumns(#"Split Column by Delimiter",{"Index"}),
removeHTML1 = Table.TransformColumns(#"Removed Columns",{{"Column1.2",each try Text.Combine(List.Select(List.Alternate(Text.SplitAny(_,"<>"),1,1,1), each _<>""), "") otherwise null, type text}}),
#"Pivoted Column" = Table.Pivot(removeHTML1, List.Distinct(removeHTML1[Column1.1]), "Column1.1", "Column1.2"),
extractEmail = Table.AddColumn(#"Pivoted Column", "email", each List.Last(Text.Split([#"ORGANIZER/CN"],":")))
in extractEmail
I have a data model with two tables sharing the same columns.
I have merged the tables using the prefixes "Old." and "New."
I'd like to add a calculated column for each column that shows if the values are different with the name like "Column_IsDifferent" and a boolean value of true or false.
I have already found out that you can add multiple columns by using List.Accumulate. But for some reason my code seems not to work as expected:
= List.Accumulate(List.Select(Table.ColumnNames(#"Extend joined table"), each Text.StartsWith(_, "New")), #"Extend joined table", (state, current) => Table.AddColumn(state, Text.RemoveRange(current, 0, 4) & "_IsDifferent", each Table.Column(state, current) <> Table.Column(state, "Old." & Text.RemoveRange(current, 0, 4)), type logical))
Basically, it takes forever to load data and I don't get an error message...
I suspect there is something wrong with this part:
Table.Column(state, current) <> Table.Column(state, "Old." & Text.RemoveRange(current, 0, 4))
You can try this in powerquery which adds a 3rd column to each 2 column pair showing if they match
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
#"Duplicated Column" = Table.DuplicateColumn(#"Unpivoted Other Columns", "Attribute", "Attribute - Copy"),
#"Split Column by Delimiter" = Table.SplitColumn(#"Duplicated Column", "Attribute - Copy", Splitter.SplitTextByDelimiter(".", QuoteStyle.Csv), {"A1", "A2"}),
#"Grouped Rows" = Table.Group(#"Split Column by Delimiter", {"Index", "A2"}, {{"data", each
let compare = if _{0}[Value] = _{1}[Value] then "match" else "nomatch"
in Table.InsertRows( _,1,{[Index = _{0}[Index], Attribute = "Delta."&_{0}[A2], Value=compare, A1="Delta", A2=_{0}[A2]]})
, type table }}),
#"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Attribute", "Value" }, {"Attribute", "Value"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded data",{"A2"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Attribute]), "Attribute", "Value"),
#"Removed Columns1" = Table.RemoveColumns(#"Pivoted Column",{"Index"})
in #"Removed Columns1"
Just stumbled upon something, what do you think about this change:
each Table.Column(state, current) <> Table.Column(state, "Old." & Text.RemoveRange(current, 0, 4))
into:
each Record.Field(_, current) <> Record.Field(_, "Old." & Text.RemoveRange(current, 0, 4))
I want to add a custom column that extracts the text between the delimiters "~"
This is the input :
This is the output which I am expecting:
I have tried below query, but its not working
=Text.Select([#"Comments"]
{"A".."Z"} & {"1".."10"}&{"~"})
Could you please suggest me
This works in powerquery (M)
It assumes data is coming from Table1, into Column 1
Adjust code if that is not true
It also assumes there is a space between the ending ~ and the next word
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Replaced Value" = Table.ReplaceValue(Source,"~ ","::",Replacer.ReplaceText,{"Column1"}),
#"Adjust edge case" = Table.TransformColumns(#"Replaced Value",{{"Column1", each if Text.End(_,1)="~" then Text.Start(_,Text.Length(_)-1) &"::" else _, type text}}),
#"Added Custom" = Table.AddColumn(#"Adjust edge case", "Custom", each List.Difference(List.Transform(Text.Split([Column1],"~"), each Text.BeforeDelimiter(_,"::")),{""})),
ColumnNames=List.Transform({1..List.Max(List.Transform(#"Added Custom"[Custom], each List.Count(_)))}, each "Data "&Text.From(_)),
#"Added Custom2" = Table.AddColumn(#"Adjust edge case", "Custom", each Text.Combine(List.Skip(List.Transform(Text.Split([Column1],"~"), each Text.BeforeDelimiter(_,"::")),1),"~")),
#"Split Column by Delimiter" = Table.SplitColumn(#"Added Custom2", "Custom", Splitter.SplitTextByDelimiter("~", QuoteStyle.Csv), ColumnNames)
in #"Split Column by Delimiter"
could you please assist solving the following tasks:
F.e. I have data set:
What i need - to create a task with description, which discounts need to be check. It should be in following format though:
SKU within same brand with same discount depth should be merged into 1 row - Check Discount 10% for brand 1 for SKU's: cream & oil.
While others should remain as same rows as they have different discounts within brand:
Check Discount 20% for brand 2 for SKU detergent
Check Discount 15% for brand 2 for SKU tabs.
There is more levels of data, f.e. the task should be within same outlet (if there is x > 1 outlets, task will be multiplied by x according to amount of outlets). But I guess it should be easy further on if I get the method how to do the mentioned above task.
Should be pretty similar to the previous one, but I might be wrong
Monitor & Catalogue columns basically describe which rows can be merged. So the output out of this table should be 2 rows:
Check positioning of 1-oil and 2-tabs on the monitor
Check positioning of 1-cream and 2-detergent on the catalogue
There can be multiple levels of aggregation, i.e. on top of rows with 1's, there can be rows with 2's - meaning they should be merged in separate task as well. 0 in all cases means - don't take.
I understand it might be a little bit overcomplicated, but i'm looking to speed up this process in Power Query as it's currently being done with VBA analyzing each row and finding match positions.
Here's the desired result with input data:
Everything further is simple. I just eliminate brand-sku and group by task.
Thank you!
You guys are making it more complicated than it needs to be. The key here is that you can aggregate text columns when using Group By.
Here's how I'd do the first one:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUouSk3MBdKGBqpKsToQsfzMHCQRIyA7JbUktSg9Na8EyDZCEi9JTCoGKTUFCsUCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Brand = _t, SKU = _t, Discount = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Brand", Int64.Type}, {"SKU", type text}, {"Discount", Percentage.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Brand", "Discount"}, {{"SKU", each Text.Combine([SKU],", "), type nullable text}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Result task", each "Check discount " & Number.ToText([Discount], "P0") & " for brand " & Number.ToText([Brand]) & " for SKU: " & [SKU], type text)
in
#"Added Custom"
Result:
Note that I've grouped on Brand and Discount and aggregated the SKU column but combining each row into a list separated by ", " using Text.Combine([SKU],", ") as the aggregating function instead of any of the default options you can choose. I usually pick Max as the aggregation and then replace that function, i.e. List.Max([SKU]), in the formula for that step.
Once you've done that grouping, you just need to string the pieces together in a custom column.
The second one can be done similarly with the added step of concatenating Brand and SKU into one column before grouping.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUouSk3MBdIGQGyoFKsDEc3PzAHzQeIgMSMgKyW1JLUoPTWvBEU1SKYkMakYoTwWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Brand = _t, SKU = _t, Monitor = _t, Catalogue = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Brand", Int64.Type}, {"SKU", type text}, {"Monitor", Int64.Type}, {"Catalogue", Int64.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "BrandSKU", each Number.ToText([Brand]) & "-" & [SKU], type text),
#"Grouped Rows" = Table.Group(#"Added Custom", {"Monitor", "Catalogue"}, {{"BrandSKU", each Text.Combine([BrandSKU], ", "), type text}}),
#"Added Custom1" = Table.AddColumn(#"Grouped Rows", "Result task", each "Check placement of " & [BrandSKU] & " on " & (if [Catalogue] = 1 then "Catalogue" else "Monitor"), type text)
in
#"Added Custom1"
Here's the first one - note that the format you requested in the first one (data entered into separate rows within the same cell using alt+enter) isn't supported in powerquery, so I separated the data with commas instead.
Instructions
Add column>add index column
Highlight index columns>transform>pivot>sku as values>advanced options>don't aggregate
Highlight all of the columns to the right>transform>merge columns (choose a separator if you want one, I chose commas)
Transform>Replace ,, with , (may have to do a few times)
Change Brand to text, Discount to %
Add column>custom column formula = "Check discount " & Number.ToText([Discount]*100) & "% for brand " & [Brand] & " for SKU " & Text.Trim([Merged],",")
Before/After
M Code
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Brand", Int64.Type}, {"SKU", type text}, {"Discount", type number}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Added Index", {{"Index", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Added Index", {{"Index", type text}}, "en-US")[Index]), "Index", "SKU"),
#"Merged Columns" = Table.CombineColumns(#"Pivoted Column",{"1", "2", "3", "4", "5"},Combiner.CombineTextByDelimiter(",", QuoteStyle.None),"Merged"),
#"Changed Type1" = Table.TransformColumnTypes(#"Merged Columns",{{"Brand", type text}, {"Discount", Percentage.Type}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Custom", each "Check discount " & Number.ToText([Discount]*100) & "% for brand " & [Brand] & " for SKU " & Text.Trim([Merged],","))
in
#"Added Custom"
2nd Example Instructions
Note: For this one it is easies to do Monitor and Separator separately. Just filter for a different one each time.
Add column>add index column
Highlight index columns>transform>pivot>sku as values>advanced options>don't aggregate
Filter for Monitor =1
Delete Monitor & Catalogue columns
Merge remaining columns, use - as separator
Transpose
Merge columns using , as separator
Find and replace -- with - (may have to do a couple times)
Custom column> Use the formula ="Check place ment of " & Text.Trim([Merged]) & " on Monitor"
2nd Example Before/After
2nd Example M-Code
let
Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Brand", Int64.Type}, {"SKU", type text}, {"Monitor", Int64.Type}, {"Catalogue", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Pivoted Column" = Table.Pivot(Table.TransformColumnTypes(#"Added Index", {{"Index", type text}}, "en-US"), List.Distinct(Table.TransformColumnTypes(#"Added Index", {{"Index", type text}}, "en-US")[Index]), "Index", "SKU"),
#"Filtered Rows" = Table.SelectRows(#"Pivoted Column", each ([Monitor] = 1)),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Monitor", "Catalogue"}),
#"Merged Columns" = Table.CombineColumns(Table.TransformColumnTypes(#"Removed Columns", {{"Brand", type text}}, "en-US"),{"Brand", "0", "1", "2", "3"},Combiner.CombineTextByDelimiter("-", QuoteStyle.None),"Merged"),
#"Transposed Table" = Table.Transpose(#"Merged Columns"),
#"Merged Columns1" = Table.CombineColumns(#"Transposed Table",{"Column1", "Column2"},Combiner.CombineTextByDelimiter(", ", QuoteStyle.None),"Merged"),
#"Replaced Value" = Table.ReplaceValue(#"Merged Columns1","--","-",Replacer.ReplaceText,{"Merged"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","--","-",Replacer.ReplaceText,{"Merged"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value1", "Custom", each "Check place ment of " & Text.Trim([Merged]) & " on Monitor")
in
#"Added Custom"
Hopefully that gets you started on how to apply PQ to your data! You may have to adjust slightly if your data sets vary.
Thanks for your advices, Hooded 0ne. Gave me the right direction.
I've done only 1st part though, here are some adjustments I made:
Added $ to SKU to find position later to be replaced with "," - now I can clear the delimiters from merge in 1st step via replace ";" with blank, replace "$" with "," and Text.End or Trim the first "," in the row
Added Select Columns to the step after "Pivot columns". There's dynamic list of columns, so I can't hardcode "1,2,3,4,5" like you did
Here's my final code for p1:
#"Add $" = Table.AddColumn(#"Filtered Rows", "SKU_SYMBOL", each "$"&[ROI_LKA_BASE.SKU]),
#"Add Index" = Table.AddIndexColumn(#"Add $", "Index", 0, 1, Int64.Type),
#"Added Custom" = Table.AddColumn(#"Add Index", "TextIndex", each "TASK_"&Number.ToText([Index])),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Index", "Shelf start", "Shelf End", "KAM", "Вид Инф. АУ", "Место размещ. АУ", "Адрес", "Attribute", "Value", "ROI_LKA_BASE.Мониторы", "ROI_LKA_BASE.Каталог", "ROI_LKA_BASE.Confirmed with customer", "ROI_LKA_BASE.Confirmed plan", "ROI_LKA_BASE.SKU"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns2", List.Distinct(#"Removed Columns2"[TextIndex]), "TextIndex", "SKU_SYMBOL"),
ColumnsToSelect = List.Select(Table.ColumnNames(#"Pivoted Column"),each Text.Contains(_,"TASK")),
#"Select Pivoted Columns" = Table.SelectColumns(#"Pivoted Column",ColumnsToSelect),
#"Merged Columns" = Table.CombineColumns(#"Pivoted Column",Table.ColumnNames(#"Select Pivoted Columns"),Combiner.CombineTextByDelimiter(";", QuoteStyle.None),"PIVOT_MERGED"),
#"Replaced Value" = Table.ReplaceValue(#"Merged Columns",";","",Replacer.ReplaceText,{"PIVOT_MERGED"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","$",",",Replacer.ReplaceText,{"PIVOT_MERGED"}),
#"Added Custom1" = Table.AddColumn(#"Replaced Value1", "TASK", each "Проверить скидку на " & [ROI_LKA_BASE.Бренд] & ": "
& Text.End([PIVOT_MERGED],Text.Length([PIVOT_MERGED])-1)),
#"Grouped Rows" = Table.Group(#"Added Custom1", {"Promo ID", "promotool_code", "ROI_LKA_BASE.Начало акции", "ROI_LKA_BASE.Конец акции", "ROI_LKA_BASE.Chain code", "ROI_LKA_BASE.Описание промо", "ROI_LKA_BASE.Сеть", "TASK"}, {{"Count", each Table.RowCount(_), Int64.Type}}
Will return later with my solution on 2nd part.
Thanks again Hooded 0ne, big help.
I'm trying to scrape some data from a website, but I need the date/time included on the span tag, as showed below:
<span class="hourAgo ng-binding" title="07/07/2020 às 09:43:33">Há 3 horas</span>
The PowerQuery looks like that:
Source = Web.BrowserContents("https://www.reclameaqui.com.br/empresa/nestle/lista-reclamacoes/"),
#"Extracted Table From Html" =
Html.Table(Source, {{
"Column1", ".text-title"
}, {
"Column2", ".text-description"
}, {
"Column3", ".status-text"
}, {
"Column4", ".hourAgo" <<<<<<< Here's the class selector I got, but I need the title content
}, {
"Column5", ".mdi-map-marker + *"
}},
[RowSelector=".complain-list:nth-child(1) LI"]),
#"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{
"Column1", type text
}, {
"Column2", type text
}, {
"Column3", type text
}, {
"Column4", type text
}, {
"Column5", type text
}})
in
#"Changed Type"
All other columns are fine. That code returns me the "Há 3 horas" span content, so far.
You'll probably want to extract that title separately since it's within the span tag, which means you'll have to parse the site as text rather than HTML.
Split the HTML text by line feed (new line).
Convert to a table.
Filter to get just the rows containing "hourAgo".
Extract the date between the quotes.
let
Source = Web.BrowserContents("https://www.reclameaqui.com.br/empresa/nestle/lista-reclamacoes/"),
#"Split Text" = Text.Split(Source, "#(lf)"),
#"Converted to Table" = Table.FromList(#"Split Text", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Filtered Rows" = Table.SelectRows(#"Converted to Table", each Text.Contains([Column1], "hourAgo")),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Filtered Rows", {{"Column1", each Text.BetweenDelimiters(_, "<span class=""hourAgo ng-binding"" title=""", """>"), type text}})
in
#"Extracted Text Between Delimiters"
You can also change this slightly to include one of the other columns so you can merge back with your original table:
let
Source = Web.BrowserContents("https://www.reclameaqui.com.br/empresa/nestle/lista-reclamacoes/"),
#"Split Text" = Text.Split(Source, "#(lf)"),
#"Converted to Table" = Table.FromList(#"Split Text", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Filtered Rows" = Table.SelectRows(#"Converted to Table", each Text.Contains([Column1], "hourAgo")),
#"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Column1", Splitter.SplitTextByDelimiter("</span>", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11"}),
#"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter",{"Column1.4", "Column1.7"}),
#"Extracted Text After Delimiter" = Table.TransformColumns(#"Removed Other Columns", {{"Column1.4", each Text.BetweenDelimiters(_, "title=", ">"), type text}, {"Column1.7", each Text.Trim(Text.AfterDelimiter(_, ">")), type text}})
in
#"Extracted Text After Delimiter"