Redshift AWS - Update table with lag() in sub query and cte - sql-update

I have a Redshift database with the following entries:
table name = subscribers
time_at
calc_subscribers
calc_unsubscribers
current_subscribers
2021-07-02 07:30:00
0
0
0
2021-07-02 07:45:00
39
8
0
2021-07-02 08:00:00
69
17
0
2021-07-02 08:15:00
67
21
0
2021-07-02 08:30:00
48
23
0
The goal is to calculate current_subscribers with the previous value.
current_subscribers = calc_subscribers - calc_unsubscribers + previous_current_subscribers
I do the following:
UPDATE subscribers sa
SET current_subscribers = COALESCE( sa.calc_subscribers - sa.calc_unsubscribers + sub.previous_current_subscribers,0)
FROM (
SELECT
time_at,
LAG(current_subscribers, 1) OVER
(ORDER BY time_at desc) previous_current_subscribers
FROM subscribers
) sub
WHERE sa.time_at = sub.time_at
The problem is that in the sub query "sub" a table is generated that is based on the current values in the table, and thus previous_current_subscribers is always 0. Instead of going through this row by row. So the result is: current_subscribers = calc_subscribers - calc_unsubscribers + 0 I have also already tried it with CTE, unfortunately without success:
The result should look like this:
time_at
calc_subscribers
calc_unsubscribers
current_subscribers
2021-07-02 07:30:00
0
0
0
2021-07-02 07:45:00
39
8
31
2021-07-02 08:00:00
69
17
83
2021-07-02 08:15:00
67
21
129
2021-07-02 08:30:00
48
95
82
I am grateful for any ideas.

The problem you are running into is that you want to use the result of one row in the calculation of the current row. This is recursive which I think you can do in this case but is expensive.
The result you are looking for is the sum of all calc_subscribers for this row and previous rows minus the sum of all calc_unsubscribers for this row and previous rows. This is the difference between 2 window functions - sum over.
sum(calc_subscribers) over (order by time_at desc rows unbounded preceding) - sum(calc_unsubscribers) over (order by time_at desc rows unbounded preceding) as current_subscribers

Related

How to create a column based on grouped condition?

My test tabe in powerbi:
IdRecord
Date
Value
1
2022-04-25 23:45:00.000
100
1
2022-04-24 18:07:00.000
344
2
2022-05-01 23:45:00.000
5
2
2022-05-02 18:07:00.000
66
2
2022-05-03 18:07:00.000
31
I require to create a calculated column to mark the earliest of the records grouped by id.
Desired output
IdRecord
Date
Value
IsFirst
1
2022-04-25 23:45:00.000
100
0
1
2022-04-24 18:07:00.000
344
1
2
2022-05-01 23:45:00.000
5
1
2
2022-05-02 18:07:00.000
66
0
2
2022-05-03 18:07:00.000
31
0
Answering to myself
FirstRes= VAR MYMIN = CALCULATE(
MIN(Table[Date]),
FILTER ( Table, Table[IdRecord] = EARLIER(Table[IdRecord]))
)
RETURN
IF(CALCULATE(
MIN(MIN(Table[Date]),MYMIN),
FILTER ( Table, Table[IdRecord] = EARLIER ( Table[IdRecord] ) )
) = Table[Date],1,0)

Group by and then sum value

I am struggling to get this going and could need some help. I have the following setup:
Order Item Material Value
22 1 100 27,5
22 1 200 27,5
22 1 300 27,5
22 2 100 33
22 3 500 101
26 1 500 88
26 1 600 88
I have duplicate values becaue of the Material, so I want to group by Order, Item and Value and then calculate the total Value in a DAX measure.
After grouping:
Order Item Value
22 1 27,5
22 2 33
22 3 101
26 1 88
The final Value:
Total Measure = 249,5
I tried the following DAX expression for the Total Measure:
Total Measure = Summarize('Table1'; 'Table1'[Order]; 'Table1'[Item]; "Sum Value:"; Sum('Table1'[Value]))
It gives me the error:
Multiple columns cannot be converted to a scalar value
So I tried:
Total Measure = Sumx('Table1'; Summarize('Table1'; 'Table1'[Order]; 'Table1'[Item]; "Sum Value:"; Sum('Table1'[Value])))
But this didnt work either. For every help thanks in advance.
The following code should be what you are looking for
Measure1 =
SUMX (
SUMMARIZE (
Table1;
Table1[Order];
Table1[Item];
Table1[Value];
"TotalSum"; SUM ( Table1[Value] )
);
[Value]
)
In this case, you can simply use the VALUES function instead of SUMMARIZE.
Total Measure = SUMX ( VALUES ( Table1[Value] ), [Value] )
This iterates over each unique Value and adds Value to the sum.

Power BI What if analysis

I have a matrix Power BI visualization which is like
Jan Feb Mar April
Client1 10 20 30 10
Client2 15 25 65 80
Client3 66 22 54 12
I have created 3 what if parameters slicer table (having values from 1 to 4) for each client
For example, If the value of the first slicer is 1 and the second is 2 and the third is 2 then I want
Jan Feb Mar April
Client1 0 20 30 10
Client2 0 0 65 80
Client3 0 0 54 12
That is, it should replace the value with zero. I have been able to achieve that for one client using Dateadd function (by adding month)
Measure = CALCULATE(SUM('Table'[Value]),
DATEADD('Table'[Column], Parameter[Parameter Value], MONTH))
and I have used this measure to display the value, but how to make it work for the other two clients as well .
Let say you have three parameter tables as follows
Parameter1 Parameter2 Parameter3
Value1 Value2 Value3
------ ------ ------
1 1 1
2 2 2
3 3 3
4 4 4
and each of them has its own slicer. Then the measure you are after might look something like this:
Measure =
VAR Val1 = MAX(Parameter1[Value1])
VAR Val2 = MAX(Parameter2[Value2])
VAR Val3 = MAX(Parameter3[Value3])
VAR CurrClient = MAX('Table'[Client])
VAR CurrMonth = MONTH(DATEVALUE(MAX('Table'[Month]) & " 1, 2000"))
RETURN SWITCH(CurrClient,
"Client1", IF(CurrMonth <= Val1, 0, SUM('Table'[Value])),
"Client2", IF(CurrMonth <= Val2, 0, SUM('Table'[Value])),
"Client3", IF(CurrMonth <= Val3, 0, SUM('Table'[Value])),
SUM('Table'[Value])
)
Basically, you read in each parameter and compare them to the month in the current cell.

rolling up groups in a matrix

Here is the data I have, I use proc tabulate to present it how it is presented in excel, and to make the visualization easier. The goal is to make sure groups strictly below the diagonal (i know it's a rectangle, the (1,1) (2,2)...(7,7) "diagonal") to roll up the column until it hits the diagonal or makes a group size of at least 75.
1 2 3 4 5 6 7 (month variable)
(age)
1 80 90 100 110 122 141 88
2 80 90 100 110 56 14 88
3 80 90 87 45 12 41 88
4 24 90 100 110 22 141 88
5 0 1 0 0 0 0 2
6 0 1 0 0 0 0 6
7 0 1 0 0 0 0 2
8 0 1 0 0 0 0 11
Ive already used if/thens to regroup certain data values, but I need a general way to do it for other sets.
Thanks in advance
desired results
1 2 3 4 5 6 7 (month variable)
(age)
1 80 90 100 110 122 141 88
2 80 90 100 110 56 14 88
3 104 90 87 45 12 41 88
4 0 94 100 110 22 141 88
5 0 0 0 0 0 0 2
6 0 0 0 0 0 0 6
7 0 0 0 0 0 0 13
8 0 0 0 0 0 0 0
Mock up some categorical data for some patients who have to be counted
data mock;
do patient_id = 1 to 2500;
month = ceil(7*ranuni(123));
age = ceil(8*ranuni(123));
output;
end;
stop;
run;
Create a tabulation of counts (N) similar to the one shown in the question:
options missing='0';
proc tabulate data=mock;
class month age;
table age,month*n=''/nocellmerge;
run;
For each month get the sub-diagonal patient count
proc sql;
/* create table subdiagonal_column as */
select month, count(*) as subdiag_col_freq
from mock
where age > month
group by month;
For each row get the pre-diagonal patient count
/* create table prediagonal_row as */
select age, count(*) as prediag_row_freq
from mock
where age > month
group by age;
other sets can be tricky if the categorical values are not +1 monotonic. To do a similar process for non-montonic categorical values you will need to create surrogate variables that are +1 monotonic. For example:
data mock;
do item_id = 1 to 2500;
pet = scan ('cat dog snake rabbit hamster', ceil(5*ranuni(123)));
place = scan ('farm home condo apt tower wild', ceil(6*ranuni(123)));
output;
end;
run;
proc tabulate data=mock;
class pet place;
table pet,place*n=''/nocellmerge;
run;
proc sql;
create table unq_pets as select distinct pet from mock;
create table unq_places as select distinct place from mock;
data pets;
set unq_pets;
pet_num = _n_;
run;
data places;
set unq_places;
place_num = _n_;
run;
proc sql;
select distinct place_num, mock.place, count(*) as subdiag_col_freq
from mock
join pets on pets.pet = mock.pet
join places on places.place = mock.place
where pet_num > place_num
group by place_num
order by place_num
;

Ranking variables (not observations)

The questionnaire I have data from asked respondents to rank 20 items on a scale of importance to them. The lower end of the scale contained a "bin" in which respondents could throw away any of the 20 items that they found completely unimportant to them. The result is a dataset with 20 variables (1 for every item). Every variable receives a number between 1 and 100 (and 0 if the item was thrown in the bin)
I would like to recode the entries into a ranking of the variables for every respondent. So all variables would receive a number between 1 and 20 relative to where that respondent ranked it.
Example:
Current:
item1 item2 item3 item4 item5 item6 item7 item8 etc.
respondent1 67 44 29 7 0 99 35 22
respondent2 0 42 69 50 12 0 67 100
etc.
What I want:
item1 item2 item3 item4 item5 item6 item7 item8 etc.
respondent1 7 6 4 2 1 8 5 3
respondent2 1 4 7 5 3 1 6 8
etc.
As you can see with respondent2, I would like items that received the same value, to get the same rank and the ranking to then skip a number.
I have found a lot of info on how to rank observations but I have not found out how to rank variables yet. Is there anyone that knows how to do this?
Here is one solution using reshape:
/* Create sample data */
clear *
set obs 2
gen respondant = "respondant1"
replace respondant = "respondant2" in 2
set seed 123456789
forvalues i = 1/10 {
gen item`i' = ceil(runiform()*100)
}
replace item2 = item1 if respondant == "respondant2"
list
+----------------------------------------------------------------------------------------------+
| respondant item1 item2 item3 item4 item5 item6 item7 item8 item9 item10 |
|----------------------------------------------------------------------------------------------|
1. | respondant1 14 56 69 62 56 26 43 53 22 27 |
2. | respondant2 65 65 11 7 88 5 90 85 57 95 |
+----------------------------------------------------------------------------------------------+
/* reshape long first */
reshape long item, i(respondant) j(itemNum)
/* Rank observations, accounting for ties */
by respondant (item), sort : gen rank = _n
replace rank = rank[_n-1] if item[_n] == item[_n-1] & _n > 1
/* reshape back to wide format */
drop item // optional, you can keep and just include in reshape wide
reshape wide rank, i(respondant) j(itemNum)