How to create a column based on grouped condition? - powerbi

My test tabe in powerbi:
IdRecord
Date
Value
1
2022-04-25 23:45:00.000
100
1
2022-04-24 18:07:00.000
344
2
2022-05-01 23:45:00.000
5
2
2022-05-02 18:07:00.000
66
2
2022-05-03 18:07:00.000
31
I require to create a calculated column to mark the earliest of the records grouped by id.
Desired output
IdRecord
Date
Value
IsFirst
1
2022-04-25 23:45:00.000
100
0
1
2022-04-24 18:07:00.000
344
1
2
2022-05-01 23:45:00.000
5
1
2
2022-05-02 18:07:00.000
66
0
2
2022-05-03 18:07:00.000
31
0

Answering to myself
FirstRes= VAR MYMIN = CALCULATE(
MIN(Table[Date]),
FILTER ( Table, Table[IdRecord] = EARLIER(Table[IdRecord]))
)
RETURN
IF(CALCULATE(
MIN(MIN(Table[Date]),MYMIN),
FILTER ( Table, Table[IdRecord] = EARLIER ( Table[IdRecord] ) )
) = Table[Date],1,0)

Related

DAX: Cumulative Completion Rate with Month Slicer

I'm trying to calculate cumulative completion rate by all users over moths, the issue is that in the below table for ex when I filter on october it divides users who finished till october / all users except those who finished in November.
I have a dim_date table which is connect to the data table, the retaltion is between Date from dim_date and Completion Date from Data table
Also in dim date table im numbering the months 1,2,3,4 etc
ID
Completion_status
Completion Date
1
0
2
0
3
0
4
0
5
0
6
1
11/1/2022
7
1
11/1/2022
8
1
11/1/2022
9
1
11/2/2022
10
1
11/1/2022
11
1
11/6/2022
12
1
11/4/2022
13
1
11/2/2022
14
1
10/13/2022
15
1
10/14/2022
16
1
10/14/2022
17
1
10/13/2022
18
1
10/15/2022
19
1
10/13/2022
20
1
10/13/2022
21
1
10/13/2022
22
1
10/13/2022
23
1
10/18/2022
24
1
10/13/2022
25
1
10/13/2022
26
1
10/13/2022
27
1
10/13/2022
28
1
9/10/2022
29
1
9/8/2022
the formula I use
Completion% =
VAR comp rate = SUM(Table[completion_status]) / count(Table[ID])
Return
CALCULATE(Table[Completion%],filter(ALL(Dim_Date),Dim_Date[Month Number] <= MAX(Dim_Date[Month Number])))
the expected result when I filter
on september is 2/29 = 7%
on october is 16/29 = 55%
on November is 24/29 = 83%
Something like:
=
VAR SelectedMonth =
MIN( Dim_Date[Month Number] )
VAR CumulativeTotal =
CALCULATE(
COUNTROWS( 'Table' ),
FILTER(
ALL( Dim_Date ),
Dim_Date[Month Number] <= SelectedMonth
&& NOT ( ISBLANK( Dim_Date[Month Number] ) )
)
)
VAR CountAllRows =
CALCULATE( COUNTROWS( 'Table' ), ALL( Dim_Date ) )
RETURN
DIVIDE( CumulativeTotal, CountAllRows )
I'm presuming that Dim_Date[Month Number] is blank when Table[Completion Date] is blank.
You may want to replace ALL with, for example, ALLSELECTED, depending on your required set-up.

Power BI DAX - Grouping rows when a value is found in row

I have the below table. I need to group them base on product and increment group number when set = 1 but returns back to 1 if new product is in next line. I have created an index already.
Index
Product
Set
1
Table
0
2
Table
0
3
Table
1
4
Table
0
5
Table
0
6
Table
1
7
Table
0
8
Table
1
9
Chair
0
10
Chair
0
11
Chair
0
12
Chair
1
13
Chair
0
14
Chair
0
15
Chair
1
Here's the result I'm after:
Index
Product
Set
Group
1
Table
0
1
2
Table
0
1
3
Table
1
1
4
Table
0
2
5
Table
0
2
6
Table
1
2
7
Table
0
3
8
Table
1
3
9
Chair
0
1
10
Chair
0
1
11
Chair
0
1
12
Chair
1
1
13
Chair
0
2
14
Chair
0
2
15
Chair
1
2
With this
Grouping=
RANKX (
FILTER (
'fact',
'fact'[Set] <> 0
&& EARLIER ( 'fact'[Product] ) = 'fact'[Product]
),
'fact'[Index],
,
ASC

Redshift AWS - Update table with lag() in sub query and cte

I have a Redshift database with the following entries:
table name = subscribers
time_at
calc_subscribers
calc_unsubscribers
current_subscribers
2021-07-02 07:30:00
0
0
0
2021-07-02 07:45:00
39
8
0
2021-07-02 08:00:00
69
17
0
2021-07-02 08:15:00
67
21
0
2021-07-02 08:30:00
48
23
0
The goal is to calculate current_subscribers with the previous value.
current_subscribers = calc_subscribers - calc_unsubscribers + previous_current_subscribers
I do the following:
UPDATE subscribers sa
SET current_subscribers = COALESCE( sa.calc_subscribers - sa.calc_unsubscribers + sub.previous_current_subscribers,0)
FROM (
SELECT
time_at,
LAG(current_subscribers, 1) OVER
(ORDER BY time_at desc) previous_current_subscribers
FROM subscribers
) sub
WHERE sa.time_at = sub.time_at
The problem is that in the sub query "sub" a table is generated that is based on the current values in the table, and thus previous_current_subscribers is always 0. Instead of going through this row by row. So the result is: current_subscribers = calc_subscribers - calc_unsubscribers + 0 I have also already tried it with CTE, unfortunately without success:
The result should look like this:
time_at
calc_subscribers
calc_unsubscribers
current_subscribers
2021-07-02 07:30:00
0
0
0
2021-07-02 07:45:00
39
8
31
2021-07-02 08:00:00
69
17
83
2021-07-02 08:15:00
67
21
129
2021-07-02 08:30:00
48
95
82
I am grateful for any ideas.
The problem you are running into is that you want to use the result of one row in the calculation of the current row. This is recursive which I think you can do in this case but is expensive.
The result you are looking for is the sum of all calc_subscribers for this row and previous rows minus the sum of all calc_unsubscribers for this row and previous rows. This is the difference between 2 window functions - sum over.
sum(calc_subscribers) over (order by time_at desc rows unbounded preceding) - sum(calc_unsubscribers) over (order by time_at desc rows unbounded preceding) as current_subscribers

Amazon Athena: Query to find out patients with compliance=0 for consecutive 10 days

Find all patients having compliance=0 from past consecutive 10 days from current date using Amazon Athena.
patient id compliance create_date
1 0 2021-01-01
1 0 2021-01-02
1 0 2021-01-03
1 0 2021-01-04--rejected not for consecutive 10
2 0 2021-01-01
2 0 2021-01-02
2 0 2021-01-03
2 0 2021-01-04
2 0 2021-01-05
2 0 2021-01-06
2 0 2021-01-07
2 0 2021-01-08
2 0 2021-01-09
2 0 2021-01-10-- accepted as for 10 consective
There are multiple ways to achieve this, and one can be to take the difference between a given date and the next one and check the cumulative sum of last X deltas (which is equal to 10 in your case) and the cumulative sum of your compliance integer on that row (which should be strictly equal to 0):
with base as (
select
*,
sum(delta) over (partition by patient_id rows between 10 preceding and current row) as cumdelta ,
sum(compliance) over (partition by patient_id rows between 10 preceding and current row) as cumcompliance
from (
select *, if (date_diff('day', date, next_date) is null, 1, date_diff('day', date, next_date)) as delta
from (
select
patient_id,
compliance,
try_cast(date as date) as date,
lead(date) over (partition by patient_id order by date) as next_date
from data
)
)
)
select
patient_id,
compliance,
date,
case when (cumdelta = 10 and cumcompliance = 0) then 'yes' else null end as validated_compliance
from base

Power BI : DAX: Running Sum with fixed start date - even when filtering

I have two tables, with:
Entrydate, several categories
ChurnDate, several categories
The categories are connected via different tables, and the dates are connected with a Calendar.
Now I want to calculate how many customers I have. So I have following DAX formulas
1. SumChurn =
CALCULATE(
SUM('kuendigungen'[KUENDIGUNG]);
FILTER(
ALLSELECTED('Calendar'[Date]);
ISONORAFTER('Calendar'[Date]; MAX('Calendar'[Date]); DESC)
)
)
2. SumEntry =
CALCULATE(
SUM('eintritt'[NEUMITGLIED]);
FILTER(
ALLSELECTED('Calendar'[Date]);
ISONORAFTER('Calendar'[Date]; MAX('Calendar'[Date]); DESC)
)
)
3. TotalCustomers = SumEntry - SumChurn
This works, but in my diagram I want to filter the dates, so that it only visualizes 2020 or the last 3 years.When I do this the calculation is wrong because it only counts in this interval.
Is there a solution that I can filter the date in my visuals but in my calculation the start date of the cummulative sum is always fixed?
I dont't want a new column because I still want to filter my categories of customers...
Thanks,
Michaela
Edit: Try to explain it clearer
Example Table 1: contains new customers
Date unique_id1 unique_id2 unique_id3 cat1 cat2 cat3 cat4 cat5 cat6
1886-02-01 2070030124 550261 207000152145 207 0 0 1 0 0
1887-01-01 4350002756 4081878 435000010707 435 0 0 1 0 0
1888-01-01 7030000597 3206858 703000001279 703 0 0 1 0 0
1888-06-01 7030016696 3208056 703000005002 703 0 0 1 0 0
1888-09-01 8210024182 204124 821000008664 821 1 0 1 0 1
1889-01-01 7050055324 1988250 705000018309 705 1 0 1 0 0
1889-01-01 8250000278 439485 825000600296 825 0 0 1 0 0
1889-05-01 7030023754 3208355 703000000884 703 0 0 1 0 0
1889-10-01 2110071206 2849359 211000330019 211 0 1 1 0 0
1889-10-01 2110071236 2851371 211000120014 211 0 0 1 0 0
1889-11-14 5190529889 4260192 519000123846 519 1 0 1 0 0
1890-07-01 7330349030 4819467 733000013102 733 0 0 1 0 0
1890-07-01 7330152914 4817492 733000075604 733 1 0 1 0 1
1890-07-01 8190000889 486170 819000215708 819 0 0 1 0 0
1890-07-01 8190444976 486199 819000215740 819 0 0 1 0 0
1890-12-01 8190001388 476049 819000100005 819 0 0 1 0 0
1891-01-01 7030001248 3206975 703000000043 703 0 0 1 0 1
Example Table 2: contains leaving customers
similiar to table 1
Example Calendar Table:
01.01.1990
02.01.1990
03.01.1990 ... (till today)
Output shut be a measure
for each day in calendar: number of customer at this date = cumulative_sum(newcustomer) - cumulative_sum(churncustomer)
I get exactly this output, when I run the calculations I wrote, but I want the measure in a way, ehen I filter the date, the sum is still the cummulative sum from the very first date, otherwise the numbers are wrong.
Edit3:
I did exactly the same thing, as mkrabbani posted, but it doesnt't work for me, following calculations:
TotalKuendigungen =
CALCULATE(
SUM('kuendigungen'[KUENDIGUNG]);
FILTER (
ALL ( 'Calendar'[Date] );
( 'Calendar'[Date] <= MAX ( ( 'Calendar'[Date] ))
)))
TotalNeukunden = CALCULATE(
SUM('eintritt'[NEUMITGLIED]);
FILTER (
ALL ( 'Calendar'[Date] );
( 'Calendar'[Date] <= MAX ( ( 'Calendar'[Date] ))
)))
AnzahlMitglieder = [SummeNeumitglied] - [SummeKuendigung]
This is how it looks for me: (Neukunden: new customers, kündigungen: leaving, aktuellemitglieder: number of customers)
Picture 1 correct calculation
Picture 2: also correct calculation, but filter doesnt work
thanks for adding some sample data with more explanation. If I get your requirement correct, this below steps with explanation will help you solving your issue I hope.
Assumption: If my understanding is correct, you have 3 tables with Date, new_customer and leaving_customer and they are related as below diagram shown.
Now, I have created some sample data for 10 days, to visualize your requirement/issue. Hope, cumulative counts in the below table is correctly calculated (using basics of cumulative calculation).
At this stage, you need a measure that will calculate current number of customer for each row based on calculation > "cumulative_new_customer - cumulative_leaving_customer" which is not a tough job for you.
But, you are having issue when you are slicing your data using Date slicer. If you are selecting date number 5, which is "January 05 2020" in my sample data. You wants the final counts based on date January 01 to 05, but you are getting only counts from one single date "January 05 2020".
If the above explanation is correct, I would suggest to write 3 separate Measure as explained below in this answer. You can have a look on the output in the below picture I have added with comparison with before and after slicing the data. You can see the number of current user for "January 05 2020" is 41 for both case (Before and After Slicing)
Now, if everything above is meeting your expectation, you can use this below 3 measures as written.
1.
cumulative_new_customer =
CALCULATE (
COUNT(new_customer[unique_id]),
FILTER (
ALL ( 'Dates'[Date] ),
'Dates'[Date] <= MAX ( 'Dates'[Date] )
)
)
2.
cumulative_leaving_customer =
CALCULATE (
COUNT(leaving_customer[unique_id]),
FILTER (
ALL ( 'Dates'[Date] ),
'Dates'[Date] <= MAX ( 'Dates'[Date] )
)
)
3.
number_of_cutomer_today = [cumulative_new_customer] - [cumulative_leaving_customer]
Hope the above details will help you.