I'm working in Power BI, and I need to do this in DAX to keep it from having to re-read the 200K PDF files(8hr refresh time).
I have a table that has duplicated ID and Step values with a different time stamp. I need to find the earliest ID and subtract all matching ID's from the earliest Time stamp. I can then use the newly found delta time value to filter the table.
I'm struggling because I need to compare on ID out of the table to all of the the other ID's looking for matches.
Example Data:
Final Data:
This post got me close, but in the IF statments they equal it to "Yes" and not to the ID in the row. How to check for duplicates with an added condition
Related
I have made table as follows (https://apexinsights.net/blog/convert-date-range-to-list):
In this scenario suppose I configure Incremental refresh on the Start Date column, then will Power BI support this correctly. I am asking because - say the refresh is for last 2 days or last 2 months, then it will fetch the source rows, and apply the transform to the partition. But my concern is that I will have to put the date param filter on Start date prior to non folding steps so that the query folds (alternatively power query will auto apply date filter so that query can fold).
So when it pulls the data based on start date and apply the transforms then I'm not able to think clearly about what kind of partitions it will create. Whether it is for Start date or for expabded date. Is query folding supported in this scenario?
This is a quite complicated scenario, where I would probably just avoid adding incremental refresh.
You would have to use the RangeStart/RangeEnd parameters twice in this query. Once that gets folded to the data source to retrieve ranges that overlap with the [RangeStart,RangeEnd) interval and a second time after expanding the ranges to filter out individual rows that fall outside [RangeStart,RangeEnd).
I have a table of companies with descriptive data about where we are in the sales stage with the company, and the date we entered that specific stage. As can be seen below, the stages are rows in a Process Step column
My objective is to pivot this column so each Process Step is a column, with a date below it, as shown in excel:
I tried to edit the query and pivot the column upon loading it, but for the "aggregate value" column, no matter which column I use as the values column, it results in some form of error
My advice would be not to pivot the table in the query and use measures to get dates that you want. The benefit of not doing so is that you are able to perform all sorts of other analytics. For instance, Sankey chart would be hard to do properly with pivoted table.
To get the pivot table you are showing from Excel, it's as simple as using matrix visual in Power BI and putting Client code in rows and Process Step in Columns, then Effective date in values.
If you need to perform calculations between stages, it's also not too difficult. For instance, you can create a meausure that shows only dates at certain stages, another measure for another stage, and so on. For example:
Date uploaded = CALCULATE(MAX(Table[Effective Date]), FILTER(Table, Table[Process Step] = "Upload"))
Date exported = CALCULATE(MAX(Table[Effective Date]), FILTER(Table, Table[Process Step] = "Export"))
Time upload to export = DATEDIFF([Date uploaded], [Date exported], DAY)
These measures will work in the context of client and assuming there is only one date for the upload step (but no Process step in rows or columns). If another scenario is needed, perhaps a different approach could be taken.
Please let me know if that solves your problem.
In Power BI I have a table with the following columns (this is a simplified version of the real table):
PullRequestId | CommitId | CommitDate
I want to find the first and last date of commits made for each pull request id.
The purpose is to calculate a metric on that data (for this example the time span of the commits).
I am not sure how to achieve it (measures or columns? what is the correct DAX expression?)
If you want to get calculated table with this data to use it later, the following DAX should do it:
SUMMARIZECOLUMNS(TableName[PullRequestId], "Min Date", MIN(TableName[CommitDate]), "Max Date", MAX(TableName[CommitDate]))
If you just want to display a visual in Power BI, then the best choice would be to use Matrix visual with PullRequestId in Rows section and two CommitDate fields in Values section. Just set aggregation rule of the first one to "Earliest" and of the second one to "Latest" so you will get a table with PullRequestId and first and last commit dates. No DAX needed here.
I am working with a data set that has some duplicate rows. The rows are not straight duplicates, but have a time stamp less than a second apart. I'd like to remove these duplicates, but the question is how.
My current plan is to add two new columns, which are copies of the time stamp column but one has a second added to it and the other has a second removed from it. I can then add steps to remove rows which have all other values the same, but have the same time stamp as time stamp plus one or minus one. Doing one after the other should eliminate duplicates but not remove truly unique rows.
How can I accomplish this in Power Query?
I think your "current plan" approach is good - I would apply that in a separate Query, started "By Reference" to the original - I'd call it something like Non-duplicated time stamps.
I would duplicate the original time stamp column and then add the new +/- 1 minute columns. I would use Unpivot Only Selected Columns on the 3 added time stamp columns to convert them from columns to rows. Then I would select the generated Value column and apply Keep Duplicates. That will keep just the first row of any duplicates found amongst the 3 time stamps.
Then back in the original query, I would add a Merge Queries step to connect it to the Non-duplicated time stamps query. I would match on the original time stamp column, possibly on other columns if required. The Join Kind would be Left Anti (rows only in first). That should remove your duplicates.
I have 40 million rows in my dataset. Each day I may get an extra 100 rows. Obviously I don't want to have to import the whole 40 million each time I do a data refresh. Is it possible to do an incremental refresh where only the new rows are added?
I don't think incremental update as you describe it is possible yet.
It looks like you can push rows with Power BI REST API, if you're happy to switch to that.
However, you might find this workaround useful:
Split your table and query into two: where date <= 'somedate' and where date >'somedate'
Add an "empty query", use Table.Combine to join your two subtables. Use this as your main table.
Whenever you need to refresh, only refresh the second query (the one with where date >'somedate').
Every once in a while, when that second query starts taking a long time, change somedate to the current date and do a full refresh.
The feature has now been implemented and is called Incremental refresh. Currently it is a premium only feature.