I have a dataset which looks like this:
ID 2017 2018 2019 2020
2017 30 24 20 18
2018 30 24 20 18
2019 30 24 20 18
2020 30 24 20 18
I am looking to create an array based on a few inputs:
%let FixedorFloating = '1 or 0';
%let Repricingfrequency = n Years;
%let LastRepricingDate = 'Date'n;
The array criteria is such that if ID= Year +2i then flag = 1
e.g.
ID = 2017 then flag =1 for Years 2017 and 2019, 0 otherwise
ID = 2018 then flag = 1 for Years 2018 and 2020, 0 otherwise
ID = 2019 then flag = 1 for Year 2019 , 0 otherwise
ID = 2020 then flag = 1 for year 2020, 0 otherwise
My code is currently, I'm having issues with year i+2 (highlighted in red) but year(i) works fine.
data ReferenceRateContract;
set refratecontract;
*arrays for years and flags;
array _year(2017:2022) year2017-year2022;
array _flag(2017:2022) flag2017-flag2022;
*loop over array;
if &FixedorFloating=1
then do i=&dateoflastrepricing to hbound(_year);
/*check if year matches year in variable name*/
if put(ID, 4.) = compress(vname(_year(i)),, 'kd')
then _flag(i)=1;
else _flag(i)=0;
end;
else if &fixedorfloating=0
then do i=&dateoflastrepricing to hbound(_year);
if put(ID, 4.) = compress(vname(_year(i)),, 'kd')
then _flag(i)=1;
else if put(ID, 4.) = compress(vname(_year(i+2)),, 'kd')
then _flag(i)=1;
else _flag(i)=0;
end;
drop i;
run;
data referenceratecontract;
set referenceratecontract;
keep flag2017--flag2020;
run;
Looking for a way to flag based on Id= Year + 2i, TIA
***else if put(ID, 4.) = compress(vname(_year(i+2)),, 'kd') then _flag(i)=1;***
The issue was that I should have put I - 2 instead of I + 2
also, I had to change my array limits as my hbound was trying to iterate to the end of the matrix.
Related
I have a dataset like below, and want to collapse a subject so that I can see if they were diagnosed with a disease at all within the past 3 years using SAS. Disease1-3 are binary yes/no flags.
For example - for subject a in 2021, since they had all 3 diseases in the prior year of 2020, they should also have flags for all those diseases in 2021 and 2022.
subject
year
disease1
disease2
disease 3
a
2020
1
1
1
a
2021
0
0
0
a
2022
0
0
0
b
2020
0
1
0
b
2021
1
0
0
b
2022
0
0
1
I'm hoping it would look something like this.
subject
year
disease1
disease2
disease 3
a
2020
1
1
1
a
2021
1
1
1
a
2022
1
1
1
b
2020
0
1
0
b
2021
1
1
0
b
2022
1
1
1
What would be the best way about going to do this? I've tried using a do loop and the retain statement, but get stuck due to the fact that there are multiple columns to consider (disease1-disease3).
Store the max value of disease into a temporary variable. Retain this for each group. If the stored max value is ever 1, set all subsequent values to be 1 for each disease.
data want;
set have;
by subject year;
array disease[*] disease1-disease3;
array disease_max[3] _temporary_;
retain disease_max;
do i = 1 to dim(disease);
if(first.subject) then disease_max[i] = 0; /* Reset disease max counter for each subject */
if(disease[i] = 1) then disease_max[i] = 1; /* Store max disease value */
if(disease_max[i] = 1) then disease[i] = 1; /* Set disease to 1 if disease_max is 1 */
end;
drop i;
run;
data have;
input subject $ year disease1 disease2 disease3;
datalines;
a 2020 1 1 1
a 2021 0 0 0
a 2022 0 0 0
b 2020 0 1 0
b 2021 1 0 0
b 2022 0 0 1
;
data temp;
set have;
array d disease:;
do over d;
if d = 0 then d = .;
end;
run;
data want;
update temp(obs=0) temp;
by subject;
array d disease:;
do over d;
if d = . then d = 0;
end;
output;
run;
I have the following problem, I would like to sum up a column and divide the sum every line through the sum of the whole column till a specific value is reached. so in Pseudocode it would look like that:
data;
set auto;
sum_of_whole_column = sum(price);
subtotal[i] = 0;
i =1;
do until (subtotal[i] = 70000)
subtotal[i] = (subtotal[i] + subtotal[i+1])/sum_of_whole_column
i = i+1
end;
run;
I get the error that I haven't defined an array... so can I use something else instead of subtotal[i]?and how can I put a column in an array? I tried but it doesn't work (data = auto and price the column I want to put into an array)
data invent_array;
set auto;
array price_array {1} price;
run;
EDIT: maybe the dataset I used is helpful :)
DATA auto ;
LENGTH make $ 20 ;
INPUT make $ 1-17 price mpg rep78 ;
CARDS;
AMC Concord 4099 22 3
AMC Pacer 4749 17 3
Audi 5000 9690 17 5
Audi Fox 6295 23 3
BMW 320i 9735 25 4
Buick Century 4816 20 3
Buick Electra 7827 15 4
Buick LeSabre 5788 18 3
Cad. Eldorado 14500 14 2
Olds Starfire 4195 24 1
Olds Toronado 10371 16 3
Plym. Volare 4060 18 2
Pont. Catalina 5798 18 4
Pont. Firebird 4934 18 1
Pont. Grand Prix 5222 19 3
Pont. Le Mans 4723 19 3
;
RUN;
Perhaps I am missing your point but your subtotal will never be equal to 70 000 if you divide by the sum of its column. The maximum value will be 1. Your incremental sum however can be equal or superior to 70 000.
data stage1;
retain _sum 0;
set auto;
_sum = sum(_sum, price);
if _sum < 70000 then output;
run;
proc sql;
create table want as
select t1.*, t1._sum/sum(price) as subtotal
from stage1 as t1;
quit;
subtotal
0.0607268256
0.1310834235
0.2746411058
0.3679017467
0.5121261056
0.5834753107
0.6994325842
0.7851820027
1
I have a matrix Power BI visualization which is like
Jan Feb Mar April
Client1 10 20 30 10
Client2 15 25 65 80
Client3 66 22 54 12
I have created 3 what if parameters slicer table (having values from 1 to 4) for each client
For example, If the value of the first slicer is 1 and the second is 2 and the third is 2 then I want
Jan Feb Mar April
Client1 0 20 30 10
Client2 0 0 65 80
Client3 0 0 54 12
That is, it should replace the value with zero. I have been able to achieve that for one client using Dateadd function (by adding month)
Measure = CALCULATE(SUM('Table'[Value]),
DATEADD('Table'[Column], Parameter[Parameter Value], MONTH))
and I have used this measure to display the value, but how to make it work for the other two clients as well .
Let say you have three parameter tables as follows
Parameter1 Parameter2 Parameter3
Value1 Value2 Value3
------ ------ ------
1 1 1
2 2 2
3 3 3
4 4 4
and each of them has its own slicer. Then the measure you are after might look something like this:
Measure =
VAR Val1 = MAX(Parameter1[Value1])
VAR Val2 = MAX(Parameter2[Value2])
VAR Val3 = MAX(Parameter3[Value3])
VAR CurrClient = MAX('Table'[Client])
VAR CurrMonth = MONTH(DATEVALUE(MAX('Table'[Month]) & " 1, 2000"))
RETURN SWITCH(CurrClient,
"Client1", IF(CurrMonth <= Val1, 0, SUM('Table'[Value])),
"Client2", IF(CurrMonth <= Val2, 0, SUM('Table'[Value])),
"Client3", IF(CurrMonth <= Val3, 0, SUM('Table'[Value])),
SUM('Table'[Value])
)
Basically, you read in each parameter and compare them to the month in the current cell.
I have an unbalanced longitudinal dataset Store_data:
Period Store Sales
Jan A 12
Feb A 10
March A 8
April A 3
Jan B 5
Feb B 19
March B 7
April B 8
Jan C 5
Feb C 19
March C 7
April C 8
At present, in order to create Sales lags of up to 2 years, I have to manually create the lag for each order. I.e.
data Store_lag;
set Store_data;
by Store;
Sales_Lag1=lag(Sales);
if first.Store then Sales_Lag1=.;
Sales_Lag2=lag(Sales_Lag1);
if first.Store then Sales_Lag2=.;
*etc.....;
run;
My question would be if there is a macro to create such variables? It gets especially tedious when the number of lag order gets large.
Array processing really should do just fine here. Here's an example.
data want;
set have;
by store;
array lags[1:4] lags0-lags3;
retain lags:;
if first.store then
call missing(of lags[*]); *clear out the array for each store;
do _i = dim(lags) to 2 by -1; *move the stack to the right;
lags[_i] = lags[_i-1];
end;
lags[1] = sales; *set the first one;
drop lags0; *lags0 is the current sales, of course;
run;
I have the below dataset wherein each employee is tied up to a manager position.
Now the employee's manager's employee ID needs to be found out using the manager position.
If in the immediate manager's position none is ACTIVE then we need to find the manager's manager position and find that any ACTIVE employee is tied to that posotion. This needs to be continued till an ACTIVE manager is found.
ID -> Employee ID
PSTN -> Employee Position code
MPSTN-> Manager Position code
STAT -> Employee Status (T - Term A - Active)
Input Dataset:
data input;
input id pstn mpstn stat$;
datalines;
1 10 30 A
2 20 30 T
3 30 40 T
6 30 40 T
4 40 50 A
7 40 50 T
5 50 50 A
;
run;
Output Dataset expected:
ID MGR_ID
1 4
2 4
3 4
6 4
4 5
7 5
5 5
I tried the RECURSIVE nature of the problem with POINT function.
It's working fine except for the recursive part - wherein searching for the next level active manager.
data output ;
set input;
flag = 1;
do I = 1 to last while (flag=1);
set input(rename=(pstn=pstn1 stat=stat1 mpstn=mpstn1 id=id1)) nobs=last
point=I;
if mpstn = pstn1 and stat1 = 'A' then
do;
MGRID = id1;
I=1;
flag=0;
end;
else flag=1;
end;
run;
Please help me with this.
You can use a hash map to loop through the input dataset to look up the values. Instead of recursively calling the lookup, I suggest putting the lookup into a do while() loop.
data input;
input id pstn mpstn stat$;
format stat $1.;
datalines;
1 10 30 A
2 20 30 T
3 30 40 T
6 30 40 T
4 40 50 A
7 40 50 T
5 50 50 A
;
run;
data out(keep=id mgr_id);
set input;
format pstn1 id1 mpstn1 best.
stat1 $1.;
if _n_ = 1 then do;
declare hash mgr(dataset:"input(rename=(pstn=pstn1 stat=stat1 mpstn=mpstn1 id=MGR_ID))");
rc = mgr.definekey("pstn1");
rc = mgr.definedata("MGR_ID");
rc = mgr.definedata("mpstn1");
rc = mgr.definedata("stat1");
rc = mgr.definedone();
end;
found = 0;
do while(^found);
pstn1 = mpstn;
rc = mgr.find();
if stat1 = "A" then do;
/*MGR Found*/
found = 1;
end;
else if rc then do;
/*RC^=0 when lookup fails*/
MGR_ID = .;
found = 1;
end;
else do;
mpstn = mpstn1;
end;
end;
run;