I imported a table from SQL which is more than 100k rows. i was connecting it to data model
but then i found that it has some duplicate rows. i identified those duplicate rows which are
they repeating twice i want to keep first and delete whole second copy of them. i tried for hours in query editor adding custom column but it just would let me add new column condition as data view in power bi did.
Hope, you'll get a better solution form SO contributors, if not you can try like this.
let
Source=...
filtered =
Table.RemoveMatchingRows(
Source
,{[Keycode= "qwe123", Desc="rty456"]}
,"Desc"
)
in
filtered
Edited: Group, and find the first occurrence, like here
#"Grouped Rows" = Table.Group(#"PriorStepNameHere", {"TestColumn1","TestColumn2"}, {{"data", each Table.RemoveColumns(Table.AddColumn(Table.AddIndexColumn(_, "Index", 0, 1, Int64.Type), "First Instance?", each if [Index]=0 then 1 else 0),{"Index"}), type table }}),
then expand
Related
How to make a calculations in Power Bi using whole list for all rows. I tried to do this way
but I got error "A cyclic reference was encountered during evaluation". Is this a way to put whole column in calculated fields and make with it some calculations?
This is my table
My desire is for example if(OrderID{index}="A" then Shippedwwo{index} else Shippedowo{index -1} like in a picture below.
To get your posted output, there is no need for the Index column (or for a custom function, for that matter).
#"Added Custom" = Table.AddColumn(previousStep, "test_column",
each if [Action]="A" then [Assignwwo] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"test_column"})
orig data
output
I have data from an external source that is downloaded in csv format. This data shows the interactions from several users and doesn't have an id column. The problem I'm having is that I'm not able to use index because multiple entries represent interactions and processes. The interactions would be the group of processes a specific user do and the process represents each actions taken in a specific interaction. Any user could repeat the same interaction at any time of day. The data looks likes this:
User1 has 2 processes but there were 3 interactions. How can I assign an ID for each interaction having into consideration that there might be multiple processes for a single user in the same day. I tried grouping them in Power Query but it groups the overall processes and I'm not able to distinguish the number of interactions. Is it better to do it in Dax?
Edit:
I notice that it is hard to understand what I need but I think this would be a better way to see it:
Process 2 are the steps done in an interaction. Like in the column in yellow I need to add an ID taking in to consideration where an interaction start and where it ends.
I'm not exactly sure I follow what you describe. It looks to me like user1 has 4 interactions--Processes AA, AB, BA, and BB--but you say 3.
Still, I decided to take a shot at providing an answer anyway. I started with a CSV file set up like you show.
Then brought the CSV into Power Query and, just to add a future point of reference so that you could follow the Id assignments better, I added an index column that I called startingIndex.
Then I added a custom column combining the processes that I understand actually define an interaction.
Then I grouped everything by users and Interactions into a column named allData.
Then I added a custom column to copy the column that was created from the earlier grouping, to sort the tables within it, and to add an index to each table within it. This essentially indexed each user's interaction group. (Because all of your interactions occur on the same date(s), the sorting doesn't help much. But I did it to show where you could do it if you included datetime info instead of just a date.)
Then I added a custom column to copy the column that was created earlier to add the interactions index, and to add an Id item within each table within it. I constructed each Id by combining the user, interactions, and interactionIndex for each.
Then I selected the latest column I had created (complexId) and removed all other columns.
Last, I expanded all tables without including the Interactions and Index columns. (The Index column was the index used for the interactions within the groups and no longer needed.) I included the startingIndex column just so you could see where items originally were at the start, in comparison to their final Id.
Given your new example, to create the Interaction ID you show, you only need the first two columns of the table. If not part of the original data, you can easily generate the third column (Process2))
It appears you want to increment the interaction ID whenever the Process changes
Please read the comments in the M code and explore the Applied Steps to better understand the algorithm:
M Code
let
//be sure to change table name in next row to your real table name
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"User", type text},
{"Process", type text},
{"Process2", type text}
}),
//add an index column
idx = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
//Custom column returns the Index if
// the current Index is 0 (first row) or
// there has been no change in user or process comparing current/previous row
// else return null
#"Added Custom" = Table.AddColumn(idx, "Custom",
each if [Index]=0
then 0
else
if [Process] <> idx[Process]{[Index]-1} or [User] <> idx[User]{[Index]-1} then [Index]
else null),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Index"}),
//Fill down the custom column
// now have same number for each interactive group
#"Filled Down" = Table.FillDown(#"Removed Columns",{"Custom"}),
//Group by the "filled down" custom column with no aggregation
#"Grouped Rows" = Table.Group(#"Filled Down", {"Custom"}, {
{"all", each _, type table [User=nullable text, Process=nullable text, Process2=nullable text, Custom=number]}
}),
//add a one-based Index column to the grouped table
#"Added Index" = Table.AddIndexColumn(#"Grouped Rows", "Interaction ID", 1, 1, Int64.Type),
#"Removed Columns1" = Table.RemoveColumns(#"Added Index",{"Custom"}),
//Re-expand the table
#"Expanded all" = Table.ExpandTableColumn(#"Removed Columns1", "all",
{"User", "Process", "Process2"}, {"User", "Process", "Process2"})
in
#"Expanded all"
Source
Results
I have a table that's generated when I pull data from an accounting software - the example columns are months/years in the format as follows (It pulls all the way to current day, and the last month will be partial month data):
Nov_2020
Dec_2020
Jan_2021
Feb_1_10_2021 (Current month, column to remove)
... So on and so forth.
My goal I have been trying to figure out is how to use the power query editor to remove the last column (The partial month) - I tried messing around with the text length to no avail (The goal being to remove anything with text length >8, so the full months data would show but the last month shouldn't). I can't just remove based on a text filter, because if someone were to pull the data 1 year from now it would have to account for 2021/2022.
Is this possible to do in PQ? Sorry, I'm new to it so if I need to elaborate more I can.. Thanks!
You can do this with Table.SelectColumns where you use List.Select on the Table.ColumnNames.
= Table.SelectColumns(
PrevStep,
List.Select(Table.ColumnNames(PrevStep), each Text.Length(_) <= 8)
)
Although both Alexis Olson's and Justyna MK's answers are valid, there is another approach. Since it appears that you're getting data for each month in a separate column, what you will surely want to do is unpivot your data, that is transform those columns into rows. It's the only sensible way to get a good material for analysis, therefore, I would suggest to unpivot the columns first, then simply filter out rows containing the last month.
To make it dynamic, I would use unpivot other columns option - you select columns and it will transform remaining columns into row in such a way that two columns will be created - one that will contain column names in rows and the other one will contain values.
To illustrate what I mean by unpivoting, when you have data like this:
You're automatically transforming that into this:
You can try to do it through Power Query's Advanced Editor. Assign the name of the last column to LastColumn variable and then use it in the last step (Removed Columns).
let
Source = Excel.Workbook(File.Contents(Excel file path), null, true),
tblPQ_Table = Source{[Item="tblPQ",Kind="Table"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(tblPQ_Table,{{"Nov_2020", Int64.Type}, {"Dec_2020", Int64.Type}, {"Jan_2021", Int64.Type}, {"Feb_1_10_2021", Int64.Type}}),
LastColumn = List.Last(Table.ColumnNames(#"Changed Type")),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{LastColumn})
in
#"Removed Columns"
The picture I have attached shows what my power query table looks like (exactly the same as source file) and then underneath what I would like the final end product to look like.
Correct me if I'm wrong but I thought the purpose of power query/power bi was to not manipulate the source file but do this in power query/power bi?
If that's the case, how can I enter new columns and data to the existing table below?
You can add custom columns without manipulating source file in power bi. Please refer to below link.
https://learn.microsoft.com/en-us/power-bi/desktop-add-custom-column
EDIT: Based on your comment editing my answer - Not sure if this helps.
Click on edit queries after loading source file to power bi.
Using 'Enter Data' button entered sample data you provided and created new table. Data can be copy pasted from excel. You can enter new rows manually. Using Tag number column to keep reference.
Merge Queries - Once the above table is created merged it with original table on tag number column.
Expand Table - In the original table expand the merged table. Uncheck tag number(as it is already present) and uncheck use original column name as prefix.
Now the table will look like the way you wanted it.
You can always change data(add new columns/rows) manually in new table by clicking on gear button next to source.
Here is the closest solution to what I found from "manual data entry" letting you as much freedom as you would like to add rows of data, if the columns that you want to create do not follow a specific pattern.
I used an example for the column "Mob". I have not exactly reproduced the content of your cells but I hope that this will not be an issue to understand the logic.
Here is the data I am starting with:
Here is the Power Query in which I "manually" add a row:
#"Added Conditional Column" = Table.AddColumn(#"Changed Type", "Mob", each if [Tag Number] = "v" then null else null),
NewRows = Table.InsertRows(#"Added Conditional Column", 2, {[Mob="15-OHIO", Tag Number="4353654", Electronic ID=1.5, NLIS="", Date="31/05/2015", Live Weight="6", Draft="", Condition store="", Weighing Type="WEAN"]})
in
NewRows
1) I first created a column with only null values:
#"Added Conditional Column" = Table.AddColumn(#"Changed Type", "Mob", each if [Tag Number] = "v" then null else null),
2) With the "Table.InsertRows" function:
I indicated the specific line: 2, (knowing that power Bi start counting at zero, at the "headers" so it will the third line in the file)
I indicated the column at which I wanted to insert the value, i.e "Mob"
I indicated the value that all other other rows should have:
NewRows = Table.InsertRows(#"Added Conditional Column", 2, {[Mob="15-OHIO", Tag Number="4353654", Electronic ID=1.5, NLIS="", Date="31/05/2015", Live Weight="6", Draft="", Condition store="", Weighing Type="WEAN"]})
Here is the result:
I hope this helps.
You can apply this logic for all the other rows.
I do not think that this is very scalable however, becaue you have to indicate each time the values of the rows in the other columns as well. There might be a better option.
Is there a way I can reorder columns in 'Data View' within Power BI? I tried doing it in Power Query first but when the data loads into the table, the columns automatically rearrange.
Edit after comment.
There is easy fix to enforce column order just as in Power Query:
In Power Query Editor > Disable Query Load. Close and Apply.
Open the Query Editor again, enable the Query Load. Refresh the Query. Then Close and Apply.
Answer to misunderstood question.
This may be interesting solution in M PowerQuery. The function below let you reorder columns by only stating few columns from the whole set of columns. Add this in blank query and rename it to FnReorderColumnsSubset.
(tbl as table, reorderedColumns as list, offset as number) as table =>
Table.ReorderColumns
(
tbl,
List.InsertRange
(
List.Difference
(
Table.ColumnNames(tbl),
reorderedColumns
),
offset,
reorderedColumns
)
)
Use it as this:
= FnReorderColumnsSubset( Source, { "Region", "RegionManager", "HeadCount" }, 0 )
Found it here:
https://datachant.com/2017/01/18/power-bi-pitfall-4/
It is extremely stupid way, but it is working - i found it by accident:
In edit query view remove the column, then save changes. You will see that in data view that column was removed as well.
Now again in edit query view remove from applied steps the action that removed the column. Save it again.
You will see that removed previously column was added to the end of the table.
This way you can arrange your columns to have it the way you want it in data view.
Hope it helped.
I don't know that you can rearrange an existing table, but if you re-create it as a new table, you can pick the order you want.
NewTable =
SELECTCOLUMNS (
OldTable,
"Column1", OldTable[Column1],
"Column2", OldTable[Column2],
"Column3", OldTable[Column3]
)
I think most here have misunderstood the problem, except #Jacko. So far as I know it is now possible to re-arrange columns in Power Query and load to the model and the table will load in the column order you specified in PQ. The problem is in dataview in the modelling layer of PBi. Here you can add many calculated columns, but, any new column you add is always placed at far right and can't be moved. Yes, I know about SELECTCOLUMNS but it isn't a solution as the new table does not have the editable formulae. A solution is a drag and drop feature of some sort. PBi users are still waiting for it despite the problem being flagged in MS Forums some years ago. No progress TIKO other than the limp SELECTCOLUMNS solution.