Merge Query Matching on Dates in Multiple Rows - powerbi

I'm trying to merge 2 queries in Power BI Desktop, matching rows based off a user and date column in one query to a row in the other query, where the user matches and the date in the 2nd query is the closest one before the date in the 1st query.
In other scenarios I need to match on more than one column, I'll usually create a composite key to match, but here's it's not a direct match.
Examples of the 2 queries are:
QUERY1
User Activity Activity Date
User 1 Activity 1 2019-01-24
User 1 Activity 2 2019-03-03
User 1 Activity 3 2019-04-17
QUERY2
User Status Status Change Date
User 1 Status 1 2019-02-05
User 1 Status 2 2019-03-06
User 1 Status 3 2019-04-05
And the merged query I'm looking for is:
MERGED QUERY
User Activity Activity Date Status
User 1 Activity 1 2019-01-24
User 1 Activity 2 2019-03-03 Status 1
User 1 Activity 3 2019-04-17 Status 3
Both queries are sourced from a REST API. If it was a SQL source, I'd use a SQL query to create a derived island table of start and stop dates based on Query2 and do a BETWEEN join against Query1 and have that be the source for Power BI.
Within the Power Query Editor, how would I get to the merged query result?

First, you want to do as you suggested and modify the status table to have start and stop dates instead of Status Change Date. You can do this by sorting, indexing, and self-merging as I've previously explained here and here.
Once you have that, you can load a copy of the status table in each row and use the User and Date columns to filter the table and finally return a single value for Status.
let
Source = <Query1 Source>
#"Added Custom" =
Table.AddColumn(Source, "Status",
(C) => List.First(
Table.SelectRows(Status,
each [User] = C[User] and
[Start] < C[Date] and
([Stop] = null or C[Date] <= [Stop])
)[Status]
),
type text)
in
#"Added Custom"
This says we take the Status table and filter it so that based on the current row the User matches and the Date is between Start and Stop. From that filtered table, we select the Status column, which is a list data type, so we pick the first element of the list to get the text value of the only member of the list.

Related

Show filtered data when no value is selected in slicer in power bi

I have two tables, once for slicer and other one is for details table. The details table have a InvoiceDate column where some rows have blank InvoiceDate. The slicer table looks like below:
The slicer will only show value of of ID 1, like below.
Initially I want slicer to be un checked and the data should show only rows where InvoiceDate is Blank. Once User select the Slicer as Include Invoiced Records, it should show both full details i.e. Rows with Blank + Non-Empty dates rows.
There are two other ways of doing what you want that are probably more 'correct' but I'll also describe a way to provide the behavior you describe.
Option one: Delete your second table. Add a calculated column to your details table as follows:
Invoice Status = IF (ISBLANK([Invoice Date]) = TRUE(), "Not yet invoiced", "Invoiced")
Create a slicer using [Invoice Status] and simply default it to show 'not invoiced.' If users want to see the invoiced records, they just check that box in the slicer as well.
Option Two: Use Bookmarks and buttons to produce the desired effect. Create two buttons, one that says 'Include Invoiced Customers' and another that says 'Hide Invoiced Customers' -- create two bookmarks where one has the invoiced customers filtered out of the visual and one where the invoiced customers aren't filtered. Set each button's "Action" to the appropriate bookmark.
Option Three Keep your 'slicer' table. Let's assume it's called 'Invoice Filter Selection.' Create a new measure:
IncludeDetailFilter =
IF (ISFILTERED('Invoice Filter Selection'[Value]) = True(),
1,
IF (ISBLANK(MAX(InvoiceDetails[Invoice Date])) = TRUE(), 1, 0)
)
When the slicer has a selection, it will be considered 'Filtered' and you will pass into the first branch of the IF where the measure always evaluates to 1. When the slicer isn't selected, the measure will evaluate to 1 or 0 depending on whether or not there are any values for Invoice Date in the row. Add this new measure as a filter on your invoice detail visual.
Unchecked:
Checked:
Hope it Helps.

Why bigquery can't handle a query processing 4TB data?

I'm trying to run this query
SELECT
id AS id,
ARRAY_AGG(DISTINCT users_ids) AS users_ids,
MAX(date) AS date
FROM
users,
UNNEST(users_ids) AS users_ids
WHERE
users_ids != " 1111"
AND users_ids != " 2222"
GROUP BY
id;
Where users table is splitted table with id column and user_ids (comma separated) column and date column
on a +4TB and it give me resources
Resources exceeded during query execution: Your project or organization exceeded the maximum disk and memory limit available for shuffle operations.
.. any idea why?
id userids date
1 2,3,4 1-10-20
2 4,5,6 1-10-20
1 7,8,4 2-10-20
so the final result I'm trying to reach
id userids date
1 2,3,4,7,8 2-10-20
2 4,5,6 1-10-20
Execution details:
It's constantly repartitioning - I would guess that you're trying to cramp too much stuff into the aggregation part. Just remove the aggregation part - I don't even think you have to cross join here.
Use a subquery instead of this cross join + aggregation combo.
Edit: just realized that you want to aggregate the arrays but with distinct values
WITH t AS (
SELECT
id AS id,
ARRAY_CONCAT_AGG(ARRAY(SELECT DISTINCT uids FROM UNNEST(user_ids) as uids WHERE
uids != " 1111" AND uids != " 2222")) AS users_ids,
MAX(date) OVER (partition by id) AS date
FROM
users
GROUP BY id
)
SELECT
id,
ARRAY(SELECT DISTINCT * FROM UNNEST(user_ids)) as user_ids
,date
FROM t
Just the draft I assume id is unique but it should be something along those lines? Grouping by arrays is not possible ...
array_concat_agg() has no distinct so it comes in a second step.

Power Bi - Count for a category for each month

I am just starting out with PowerBi. Please use the table below as my sample. Assume that the Order Table has a *-1 relationship with Order Status.
I want to create a bar graph to see the number of each Order Status by Month.
The month is at the bottom and each month potentially has 3 bars. 1 bar representing the count of each Order Status for that month.
I need some direction. I know this is an open-ended question, but I at a total loss.
I have created the same tables as you provided and joined them on order status id (*-1 relationship). Then created the status column.
Status = RELATED('Order Status'[Status] )
Dragged and dropped this column on to the clustered column chart under legend and values (default summarised as count). Then got the following visual.

Join table based on the last hit in Power BI

I am using Power BI and need to join two tables, but I want to display only the last result. Below I show more details:
Table1:
number description
263745 Bank reconciliation
Table2:
number status
263745 progress
263745 completed
After joining the tables, the result:
number description status
263745 Bank reconciliation progress
263745 Bank reconciliation completed
But I would like to show only the last result, as below:
number description status
263745 Bank reconciliation completed
What am I doing wrong?
Add an index column to your Table2
Create a calculated column in Table1 like this:
Last Status = LOOKUPVALUE(Table2[status], Table2[number], Table1[number], Table2[id], CALCULATE(MAX(Table2[id]), FILTER(Table2, Table1[number] = Table2[number])))
If you do not have an index column but a date column, simply change the CALCULATE(MAX(...) condition to not get the max id but max date.
If you like, hide the Table2

Tableau data extract refresh

So I have this database which I will build a dashboard with. A tableau admin will refresh d extract everyday.
example of data
Date item
1-jul. Book
2-jul. Cane
.
.
24-jul Rice
25-jul. Car
Everyday new row is added
Question: how do I set up my date range filters in d data source view so that the extract refreshes by the admin will reflect the current data always.
One solution although bit long is:
Create calculated field "date2" as type date (based on date info from your data.) use date function if e.g year id missing.
Create calculated field "max date" as type date (max(date2))
then create yet another calculated field "max date filter"
If date2 = [max date] then 1 end
Drag filter field to filters and set it to 1
Drag item to rows