I'm trying to do web scraping with power bi where I'm using the data from the following site:
https://pt.wikipedia.org/wiki/Jogo_do_bicho
After passing the site URL, the data came organized in the following format:
![Screenshot 1][1]
[1]: https://i.stack.imgur.com/HPjE7.png
where the number is an index related to the animal that has its specific thousand, how do I put everything organized in a column with all indices?
I have an example attached:
![Screenshot 2][2]
[2]: https://i.stack.imgur.com/cxWbU.png
I'll try to add detail later but I think this will work:
let
Source = Web.Page(Web.Contents("https://pt.wikipedia.org/wiki/Jogo_do_bicho")){0}[Data],
ToLists = List.Skip(Table.ToColumns(Source),1),
#"Converted to Table" = Table.FromList(ToLists, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1"),
#"Added Custom" = Table.AddColumn(#"Expanded Column1", "Pivot", each if Text.Length([Column1]) = 2 then "Group" else "Animal"),
#"Added Index" = Table.AddIndexColumn(#"Added Custom", "Index", 0, 1),
#"Integer-Divided Column" = Table.TransformColumns(#"Added Index", {{"Index", each Number.IntegerDivide(_, 2), Int64.Type}}),
#"Pivoted Column" = Table.Pivot(#"Integer-Divided Column", List.Distinct(#"Integer-Divided Column"[Pivot]), "Pivot", "Column1"),
#"Split Column by Delimiter" = Table.SplitColumn(#"Pivoted Column", "Animal", Splitter.SplitTextByDelimiter("#(lf)#(cr)", QuoteStyle.Csv), {"Animal", "Values"}),
#"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter",{{"Animal", Text.Trim, type text}, {"Values", Text.Trim, type text}}),
#"Changed Type" = Table.TransformColumnTypes(#"Trimmed Text",{{"Group", Int64.Type}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Index"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns",{{"Group", Order.Ascending}})
in
#"Sorted Rows"
Edit: The key here is to convert the table into a list of columns using Table.ToColumns. This turns it into a list of lists that we can convert into a table and expand into one long column.
Once all of the columns are stacked into one single column, we want to separate the group id from the details, which we can do in this case by checking the length of the text and defining a custom column that labels each row with a different data category.
With that categorization of the rows in place, we want to pivot that new custom column but we want an index column so it knows what stays together. Add an index column and integer divide by two so you get 0,0,1,1,2,2,3,3,... so that each pair gets its own unique ID. Now we can finally pivot.
Once pivoted, do any cleanup you feel like, e.g., splitting columns, trimming whitespace, changing column types, removing unneeded columns, and sorting.
Related
I am working with data that is structured in a parent-child relationship. Every level already has the rolled-up value (sum of the children). Therefore, I want Power Bi to display the value that is shown at every level (or sum on the same level) and not aggregate between parent and child. Also, the sums of the children don't always equal the parents, because some details are missing at the lower level. I do still need the parent-child relationship in PowerBi for drill through purposes. Attached is an example of how the data is structured. The table on the left is how I receive it. I have a dynamic number of levels, so ideally the solution wouldn't have a hard-coded number of levels. Any thoughts?
There's no way around having a fixed number of levels in PowerBI, but you can easily write a measure that takes the MAX rather than the SUM of the values.
You can apply these below steps to your table in Advanced Editor to get your desired output. Please replace the previous_step_name name (my frst step) accordingly to make this code functional.
let
//Your previous steps,
#"Added Index" = Table.AddIndexColumn(#"previous_step_name", "Index", 1, 1, Int64.Type),
tab_before_split = Table.ReorderColumns(#"Added Index",{"Index", "Item", "Value"}),
tab_split = Table.ExpandListColumn(Table.TransformColumns(#"tab_before_split", {{"Item", Splitter.SplitTextByDelimiter("/", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Item"),
tab_group_by = Table.Group(#"tab_split",{"Index"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
tab_new = Table.Join(tab_before_split,"Index", tab_group_by,"Index"),
#"Split Column by Delimiter" = Table.SplitColumn(tab_new, "Item", Splitter.SplitTextByDelimiter("/", QuoteStyle.Csv), {"Item.1", "Item.2", "Item.3"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Item.1", type text}, {"Item.2", type text}, {"Item.3", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Level", each [Count] - 1),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Count"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Removed Columns",{{"Level", Int64.Type}})
in
#"Changed Type2"
Sample input-
Sample output-
Hello beautiful people,
Could someone please help me with my below request, (noting that I am working with Power Query Editor). So I need it to be done using creating conditional columns in power query maybe?, please help.
I need to group users in a table based on a category with showing their count in multiple different fields, As per the below example:
I need results to be:
Muuuuuuuuch Appreciated
In powerquery, try this
Click select period and name columns
Right click, unpivot other columns
Click select period, attribute and value columns
Right click, group ... new column name Count, operation Count rows
Click select attribute column. Transform ... pivot column... values column Count, Advanced options, do not aggregate
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Period", "Name"}, "Attribute", "Value"),
#"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"Period", "Attribute", "Value"}, {{"Count", each Table.RowCount(_), type number}}),
#"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[Attribute]), "Attribute", "Count", List.Sum)
in #"Pivoted Column"
Alternately,
Click select period and name columns
Right click, unpivot other columns
Right click name column and remove
Right click value column and duplicate
Click select attribute column .. Transform pivot column ... Values column:Values Copy
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Period", "Name"}, "Attribute", "Value"),
#"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns",{"Name"}),
#"Duplicated Column" = Table.DuplicateColumn(#"Removed Columns", "Value", "Value - Copy"),
#"Pivoted Column" = Table.Pivot(#"Duplicated Column", List.Distinct(#"Duplicated Column"[Attribute]), "Attribute", "Value - Copy", List.Count)
in #"Pivoted Column"
Neither one is going to put in a blank row for a combination that does not exist like Dec No. Thats more complicated if required
I have a PostgreSQL database where each row represents a day, and each column represents an attribute about the customers that been measured at the specific day. This database is being updated daily using python code. In these days I am trying to build a dashboard in Power Bi in order to share the data with stakeholders. I want to add to the dashboard a line chart which shows how one columns' values change overtimes. In this line chart, I want to show the change in percentage in each day. In excel it should look like this:
You can accomplish this in Power Query (i.e. during data import and transformation) as follows:
Load the data, making sure the rows are ordered by date ascending:
Add an Index column "From 0", then another Index column "From 1":
Merge the table with itself, selecting "Index" first and "Index.1" second:
Expand "Column 1" from the new column added to your table:
Subtract the new column from the original value (select "Column 1" and "Added Index1.Column 1", then go to Add Column > Standard > Subtract):
Remove all unneeded columns:
Of course, you can then rename columns as necessary.
The Power Query code in this example is as follows:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TczJCcAwDETRXnQ2aIsSqRbj/tuwCQqZ62P+zEkWnGyiRYNUhNY4doNF2wOWbQlWbfWbdesC1q0rWMVryvWRe98BXWe1Ng==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Date = _t, #"Column 1" = _t]), // set up the table as shown in your example
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Column 1", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
#"Added Index1" = Table.AddIndexColumn(#"Added Index", "Index.1", 1, 1),
#"Merged Queries" = Table.NestedJoin(#"Added Index1", {"Index"}, #"Added Index1", {"Index.1"}, "Added Index1", JoinKind.LeftOuter),
#"Expanded Added Index1" = Table.ExpandTableColumn(#"Merged Queries", "Added Index1", {"Column 1"}, {"Added Index1.Column 1"}),
#"Inserted Subtraction" = Table.AddColumn(#"Expanded Added Index1", "Subtraction", each [Column 1] - [Added Index1.Column 1], Int64.Type),
#"Removed Columns" = Table.RemoveColumns(#"Inserted Subtraction",{"Index", "Index.1", "Added Index1.Column 1"})
in
#"Removed Columns"
you can also use day over day change and plot the values as percentage.
I have a spreadsheet that contains column Names as the product name, quantity, cost.
I want to convert this to rows of data that contain Product Name, Quantity, Cost.
See image below as to what I want.
What is the best way to handle this in Power Query M Language?
Not sure if I want to pivot just the columns that have prod name, quantity and cost?
Thanks
Here's A way...
Starting with this table as Table1:
You can select the Customer column and Unpivot Other Columns to get this:
Then you can add an index column (keep it named Index) and then also a custom column (keep it named Custom) with if Text.EndsWith([Attribute],"Cost") then 1 else 0 as its formula to get this:
Then add another custom column... Name it Total Cost and enter #"Unpivoted Other Columns"[Value]{[Index]+(List.Count(#"Added Custom"[Custom])/List.Sum(#"Added Custom"[Custom]))} as its formula to get:
The two steps above were, first, to set up to locate the corresponding Cost of the Tshirts based on the Cost's position in the Value column and, then, to actually locate the cost and record it on the same line as the respective Tshirts. The Index column provides row positioning information while the Custom column provides count information--both the overall list count and the count of rows with Cost. I use the count information to determine how many index positions to move down the Value column to get associated cost values dynamically.
Then filter on the Attribute column, using Text Filters > Does Not End With... and type the word Cost. All the rows with an Attribute entry ending with the word Cost should disappear:
Remove the Index and Custom columns and Rename the Attribute and Value columns to Product Name and Quantity, respectively to get your final result:
Here's my M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Customer"}, "Attribute", "Value"),
#"Added Index" = Table.AddIndexColumn(#"Unpivoted Other Columns", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each if Text.EndsWith([Attribute],"Cost") then 1 else 0),
#"Added Custom2" = Table.AddColumn(#"Added Custom", "Total Cost", each #"Unpivoted Other Columns"[Value]{[Index]+(List.Count(#"Added Custom"[Custom])/List.Sum(#"Added Custom"[Custom]))}),
#"Filtered Rows" = Table.SelectRows(#"Added Custom2", each not Text.EndsWith([Attribute], "Cost")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Index", "Custom"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Attribute", "Product Name"}, {"Value", "Quantity"}})
in
#"Renamed Columns"
They key here is pivoting and unpivoting.
Starting with a table like this,
Select the right four columns and click Transform > Unpivot Columns to get this table:
Now create a custom column that classifies the value using this formula.
if Text.EndsWith([Attribute], "Cost") then "Cost" else "Quantity"
I also chopped off the " Cost" piece at the end of the Attribute column. You can either Transform > Replace Values and replace " Cost" with nothing or Transform > Extract > Text Before Delimiter " Cost".
Now pivot the custom column (choose the Value column as your Values Column choice) and, finally, rename the Attribute column to Product Name.
Here's my M code for all the steps:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcknNK0stUnAuLS7Jz00tUtJRMjIGEoYmIJapqZ6pAYhnZKpnYKAUqxOt5JyRmZyYno+swdAQSJiagtUZgNQBeeYQDbEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Customer = _t, #"Product Orange T-shirt" = _t, #"Product Blue T-shirt" = _t, #"Product Orange T-shirt Cost" = _t, #"Product Blue T-shirt Cost" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Customer", type text}, {"Product Orange T-shirt", Int64.Type}, {"Product Blue T-shirt", Int64.Type}, {"Product Orange T-shirt Cost", type number}, {"Product Blue T-shirt Cost", type number}}),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Customer"}, "Attribute", "Value"),
#"Added Custom" = Table.AddColumn(#"Unpivoted Columns", "Custom", each if Text.EndsWith([Attribute], "Cost") then "Cost" else "Quantity"),
#"Replaced Value" = Table.ReplaceValue(#"Added Custom"," Cost","",Replacer.ReplaceText,{"Attribute"}),
#"Pivoted Column" = Table.Pivot(#"Replaced Value", List.Distinct(#"Replaced Value"[Custom]), "Custom", "Value", List.Sum),
#"Renamed Columns" = Table.RenameColumns(#"Pivoted Column",{{"Attribute", "Product Name"}})
in
#"Renamed Columns"
I have the above table:
The table can have multiple records for same id (title will be different)
Now i need to add a CALCULATED COLUMN "Parent Title".
So, in case parent id is 1, then parent title should be last title of id 1 (Title 11).
in case parent id is 2, then parent title should be last title of id 2 (Title 22).
and in case parent id is BLANK, then parent title should be BLANK.
How can i achieve this ?
There may be a cleaner way to do it, but I was able to get this via Power Query:
Here's how:
I started with your table (I called it Table1):
Then I used Table1 as a source for creating Table2. (I used "Reference" to do that: I right-clicked on Table1 and selected "Reference" from the drop down.)
Then I transformed Table2 into the following. (I did some sorting first, then some other "fun stuff." You can see what all I did in the M code.)
Here's the M code for Table2:
let
Source = Table1,
#"Sorted Rows" = Table.Sort(Source,{{"Id", Order.Ascending}, {"title", Order.Ascending}, {"Parent Id", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 1, 1),
#"Grouped Rows" = Table.Group(#"Added Index", {"Id"}, {{"MinIndex", each List.Min([Index]), type number}, {"MaxIndex", each List.Max([Index]), type number}, {"AllData", each _, type table}}),
#"Expanded AllData" = Table.ExpandTableColumn(#"Grouped Rows", "AllData", {"title", "Parent Id", "Index"}, {"title", "Parent Id", "Index"}),
#"Added Custom" = Table.AddColumn(#"Expanded AllData", "MaxTitle", each if [MaxIndex]=[Index]then[title]else null),
#"Filled Up" = Table.FillUp(#"Added Custom",{"MaxTitle"}),
#"Added Custom1" = Table.AddColumn(#"Filled Up", "LesserMaxIndex", each [MinIndex]-1)
in
#"Added Custom1"
(You can copy the M code above and paste it over the initial code in Table2's query, which was generated during the "Reference", in the "Advanced Editor"...)
Then I used Table2 as a source for creating Table3. (I used "Reference" again.)
Table3's M code is very simple:
let
Source = Table2
in
Source
And finally, I merged Table2 and Table3 by using "Merge Queries" on the "Home" tab. Specifically, I used "Merge Queries as New".
(Note I matched LesserMaxIndex from Table2 with MaxIndex from Table3 and used a Left Outer join.)
I named the merged query "Merge1".
Then I did some cleanup to Merge1, which you can see in the M code:
let
Source = Table.NestedJoin(Table2,{"LesserMaxIndex"},Table3,{"MaxIndex"},"Table1 (4)",JoinKind.LeftOuter),
#"Expanded Table1 (4)" = Table.ExpandTableColumn(Source, "Table1 (4)", {"MaxTitle"}, {"ParentTitle"}),
#"Sorted Rows" = Table.Sort(#"Expanded Table1 (4)",{{"Id", Order.Ascending}, {"title", Order.Ascending}, {"Parent Id", Order.Ascending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows"),
#"Removed Other Columns" = Table.SelectColumns(#"Removed Duplicates",{"Id", "title", "Parent Id", "ParentTitle"})
in
#"Removed Other Columns"
This is fairly easy to do with an Excel worksheet formula along the lines of
=IF(Table1[#[Parent ID]]="","",INDEX(B:B,MATCH(Table1[#[Parent ID]],A:A,1)))
By its nature, the language of PowerPivot/Power BI, i.e. DAX, does not have something like Index/Match or Vlookup, since such constructs are done by relating tables.
If you are in a situation where you want to calculate something like that, it might be wise to re-evaluate your data architecture. Create another query that loads your initial table, sort it, remove the duplicate IDs and keep only the ones you want. Then you can use relationships in Power Pivot or use the query to merge with the original data table.