SAS: divide table by another table - sas

I am pretty new is SAS and now i am struggling with division table by table. Tables have the same size.
My goal is to divide each element of table_1 by the corresponding element of table_2, producing a new table.
Google advises to use SAS/IML but i have no access to it.
Is there any option to do that in data step? Another ideas?
For instance first table looks like:
30 30
30 30
Second table:
2 3
5 6
Then output table should be:
15 10
6 5
Thank you a lot in advice!

One way to do would be merge the two tables together and perform the division.
data dividend;
a = 30; b = 30; output;
a = 30; b = 30; output;
run;
data divisor;
c = 2; d = 3; output;
c = 5 ; d = 6 ; output;
run;
data comb;
merge dividend divisor;
q1 = a/c;
q2 = b/d;
keep q1 q2;
run;
This assumes that there a 1 - to - 1 correspondence between the dividend and divisor rows
EDITED TO RESPOND TO QUESTION IN COMMENTS
Assuming you have years 2015 to 2010, each with 4 quarters, you could write a macro loop:
%macro divide;
%let years = %str(2015 2016 2017 2018 2019);
%let qtrs = %str(Q1 Q2 Q3 Q4);
data comb;
merge dividend divisor;
%let i = 1;
%do %while (%scan(&years, &i) ne );
%let year = %scan(&years, &i);
%let j = 1;
%do %while (%scan(&qtrs, &j) ne );
%let q = %scan(&qtrs, &j);
R_&q._&year = &q._&year._D / &q._&year._A;
%let j = %eval(&j + 1);
%end;
%let i = %eval(&i + 1);
%end;
keep R_:;
run;
%mend;
%divide;

You probably should store your data in a different way to make this problem easier (and most problems easier). Instead of storing a "matrix" where the row number and column number have some hidden meaning you could create a dataset where that meaning is stored into variables. So instead of:
30 30
30 30
You could have:
Row Col Value
1 1 30
1 2 30
2 1 30
2 2 30
Just use meaningful names instead of ROW,COL and VALUE. Like COUNTRY, YEAR, POPULATION.
Then your division problem becomes both simpler and clearer. Combine the two datasets by the id variables and divide the variables to make a new variable.
data death_rate ;
merge deaths population;
by country year ;
death_rate = deaths / population;
run;
If you do want to work with matrices then look into PROC IML. Here is example from documentation.
proc iml;
a = {1 2,
3 4};
b = {5 6,
7 8};
c = a/b;

Related

Loop through SAS variables and create data sets

I have a SAS data set t3. I want to run a data step inside a loop through a set of variables to create additional sets based on the variable value = 1, and rank two variables bal and otheramt in each subset, and then merge the ranks for each subset onto the original data set. Each rank column needs to be dynamically named so I know what subset is getting ranked. I know how to do proc rank and macros basically but do not know how to do this in the most dynamic way inside of a macro. Can you assist?
ID
bal
otheramt
firstvar
secondvar
lastvar
444
581
100
1
1
555
255
200
1
1
1
666
255
300
--------------
1
--------------
%macro dog();
data new;
set t3;
ARRAY Indicators(5) FirstVar--LastVar;
/*create data set for each of the subsets if firstvar = 1, secondvar = 1 ... lastvar = 1 */
/*for each new data set, rank by bal and otheramt*/
/*name the new rank columns [FirstVar]BalRank, [FirstVar]OtherAmtRank; */
/*merge the new ranks onto the original data set by ID*/
%mend;
%dog()
The Proc rank section would be something like this, but I would need the rank columns to have information about what subset I am ranking.
proc rank data=subset1 out=subset1ranked;
var bal otheramt;
ranks bal_rank otheramt_rank;
run;
Instead of using macro, use data transformation and reshaping that allows simpler steps to be written.
Example:
Rows are split into multiple rows based on flag so group processing in RANK can occur. Two transposes are required to reshape the results back a single row per id.
data have;
call streaminit(20230216);
do id = 1 to 100;
foo = rand('integer', 50,150);
bar = rand('integer', 100,200);
flag1 = rand('integer', 0, 1);
flag2 = rand('integer', 0, 1);
flag3 = rand('integer', 0, 1);
output;
end;
run;
data step1;
set have;
/* important: the group value becomes part of the variable name later */
if flag1 then do; group='flag1_'; output; end;
if flag2 then do; group='flag2_'; output; end;
if flag3 then do; group='flag3_'; output; end;
drop flag:;
run;
proc sort data=step1;
by group;
run;
proc rank data=step1 out=step2;
by group;
var foo bar;
ranks foo_rank bar_rank;
run;
proc sort data=step2;
by id group;
run;
* pivot (reshape) so there is one row per ranked var;
proc transpose data=step2 out=step3(drop=_label_);
by id foo bar group;
var foo_rank bar_rank;
run;
* pivot again so there is one row per id;
proc transpose data=step3 out=step4(drop=_name_);
by id;
var col1;
id group _name_;
run;
* merge so those 0 0 0 flag rows remain intact;
data want;
merge have step4;
by id;
run;
Since we don't have much sample data, I created test data from sashelp.class with some indicator variables like yours.
data have;
set sashelp.class;
firstvar=round(rand('uniform',1));
secondvar=round(rand('uniform',1));
thirdvar=round(rand('uniform',1));
drop sex weight;
run;
Partial output:
Name Age Height firstvar secondvar thirdvar
Alfred 14 69 1 0 1
Alice 13 56.5 0 1 1
Barbara 13 65.3 1 0 0
Carol 14 62.8 0 0 0
To dynamically rank data based on indicator variables, I created a macro that accepts a list of indicators and rank variables. The 2 lists help to create the specific variable names you requested. Here's the macro call:
%rank(indicators=firstvar secondvar thirdvar,
rank_vars=age height);
Here's part of the final output. Notice the indicators in the sample output above coincide with the ranks in this output. Also note that Carol is not in the output because she had no indicators set to 1.
Name Age Height firstvar_age_rank firstvar_height_rank secondvar_age_rank secondvar_height_rank thirdvar_age_rank thirdvar_height_rank
Alfred 14 69 8 11 . . 6.5 10
Alice 13 56.5 . . 3.5 2 4.5 2
Barbara 13 65.3 6.5 8 . . . .
Henry 14 63.5 . . 5.5 5 . .
The full macro is listed below. It has 3 parts.
Create a temp data set with a group variable that contains the number of the indicator variable based on the order of the variable in the list. Whenever an indicator = 1 the obs is output. If an obs has all 3 indicators set to 1 then it will be output 3 times with the group variable set to the number of each indicator variable. This step is important because proc rank will rank groups independently.
Generate the rankings on the temp data set. Each group will be ranked independently of the other groups and can be done in one step.
Construct the final data set by essentially transposing the ranked data into columns.
%macro rank(indicators=, rank_vars=);
%let cnt_ind = %sysfunc(countw(&indicators));
%let cnt_vars = %sysfunc(countw(&rank_vars));
data temp;
set have;
array indicators(*) &indicators;
do i = 1 to dim(indicators);
if indicators(i) = 1 then do;
group = i; * create a group based on order of indicators;
output; * an obs can be output multiple times;
end;
end;
drop i &indicators;
run;
proc sort data=temp;
by group;
run;
* Generate rankings by group;
proc rank data=temp out=ranks;
by group;
var &rank_vars;
ranks
%let vars = ;
%do i = 1 %to &cnt_vars;
%let var = %scan(&rank_vars, &i);
%let vars = &vars &var._rank;
%end;
&vars;
run;
proc sort data=ranks;
by name group;
run;
* Contruct final data set by transposing the ranks into columns;
data want;
set ranks;
by name;
* retain statement to declare new variables and retain values;
retain
%let vars = ;
%do i = 1 %to &cnt_ind;
%let ivar = %scan(&indicators, &i);
%do j = 1 %to &cnt_vars;
%let jvar = %scan(&rank_vars, &j);
%let vars = &vars &ivar._&jvar._rank;
%end;
%end;
&vars;
if first.name then call missing (of &vars);
* option 1: build series of IF statements;
%let vars = ;
%do i = 1 %to &cnt_ind;
%let ivar = %scan(&indicators, &i);
%str(if group = &i then do;)
%do j = 1 %to &cnt_vars;
%let jvar = %scan(&rank_vars, &j);
%let newvar = &ivar._&jvar._rank;
%str(&newvar = &jvar._rank;)
%end;
%str(end;)
%end;
if last.name then output;
drop group
%let vars = ;
%do i = 1 %to &cnt_vars;
%let var = %scan(&rank_vars, &i);
%let vars = &vars &var._rank;
%end;
&vars;
run;
%mend;
When constructing the final data set and transposing the rank variables, there are a couple of options. The first option shown above is to dynamically build a series of if statements. Here is what the code generates:
MPRINT(RANK): * option 1: build series of IF statements;
MPRINT(RANK): if group = 1 then do;
MPRINT(RANK): firstvar_age_rank = age_rank;
MPRINT(RANK): firstvar_height_rank = height_rank;
MPRINT(RANK): end;
MPRINT(RANK): if group = 2 then do;
MPRINT(RANK): secondvar_age_rank = age_rank;
MPRINT(RANK): secondvar_height_rank = height_rank;
MPRINT(RANK): end;
MPRINT(RANK): if group = 3 then do;
MPRINT(RANK): thirdvar_age_rank = age_rank;
MPRINT(RANK): thirdvar_height_rank = height_rank;
MPRINT(RANK): end;
The 2nd option is to use an array and mathematically calculate the index into the array by the group number and variable number. Here is the snippet of macro code to replace the if series code:
* option 2: create arrays and calculate index into array
* by group number and variable number;
array ranks(*) &vars;
array rankvars(*)
%let vars = ;
%do i = 1 %to &cnt_vars;
%let var = %scan(&rank_vars, &i);
%let vars = &vars &var._rank;
%end;
&vars;
%str(idx = dim(rankvars) * (group - 1);)
%str(do i = 1 to dim(rankvars);)
%str(ranks(idx + i) = rankvars(i);)
%str(end;)
Here is the generated code:
MPRINT(RANK): * option 2: create arrays and calculate index into array * by group number and variable number;
MPRINT(RANK): array ranks(*) firstvar_age_rank firstvar_height_rank secondvar_age_rank secondvar_height_rank thirdvar_age_rank
thirdvar_height_rank;
MPRINT(RANK): array rankvars(*) age_rank height_rank;
MPRINT(RANK): idx = dim(rankvars) * (group - 1);
MPRINT(RANK): do i = 1 to dim(rankvars);
MPRINT(RANK): ranks(idx + i) = rankvars(i);
MPRINT(RANK): end;
It takes a minute to understand the array option, but once you do, it is preferable over generating if statments. As the number of variables increases, the code generated by the array option is the same and operates more efficiently.

Divide a dataset into subsets based on a column and perform a repeated operation for subsets

I need to perform the same operation on many different periods. In my sample data for two periods: 402 and 403.
I cannot understand the concept of how I can make a loop that will do it for me.
At the end, I'd like to have final1 for period 402, final2 for period 403 etc.
Sample data that I use for testing:
data one;
input period $ a $ b $ c $ d e;
cards;
402 a . a 1 3
402 . b . 2 4
402 a a a . 5
402 . . b 3 5
403 a a a . 6
403 a a a . 7
403 a a a 2 8
;
run;
This is how I manually choose one period of one data:
data new;
set one;
where period='402';
run;
This is how I calculate different things for the given period e.g. number of missing data, non-missing, total:
1 - For numeric variables:
proc iml;
use new;
read all var _NUM_ into x[colname=nNames];
n = countn(x,"col");
nmiss = countmiss(x,"col");
ntotal = n + nmiss;
2 - and similarly for char variables:
read all var _CHAR_ into x[colname=cNames];
close nww;
c = countn(x,"col");
cmiss = countmiss(x,"col");
ctotal = c + cmiss;
Save numeric and char results:
create cnt1Data var {nNames n nmiss ntotal};
append;
close cnt1Data;
create cnt2Data var {cNames c cmiss ctotal};
append;
close cnt2Data;
Rename columns to be the same:
data cnt1Datatemp;
set cnt1Data;
rename nNames = Name n = nonMissing nmiss = missing ntotal = total;
run;
data cnt2Datatemp;
set cnt2Data;
rename cNames = Name c = nonMissing cmiss = missing ctotal = total;
run;
and merge data into the final set:
data final;
set cnt1Datatemp cnt2Datatemp;
run;
Final data for period 402 should look like:
a b c d e
2 2 1 1 0 - missing
2 2 3 3 4 - non-missing
4 4 4 4 4 - total
and respectively for period 403:
a b c d e
0 0 0 2 0 - missing
3 3 3 1 3 - non-missing
3 3 3 3 3 - total
You can make something similar with simple SQL query.
create table miss_count as select period
, sum(missing(A)) as A
, sum(missing(B)) as B
...
from have
group by period
;
Results:
period a b c d e
402 2 2 1 1 0
403 0 0 0 2 0
It you add in
, count(*) as nobs
then you have all the information you need to calculate all of the counts you wanted.
If the number of variables is short enough you can even generate the code into a macro variable (limit of 64K bytes in a macro variable)
proc sql noprint;
select catx(' ','sum(missing(',nliteral(name),')) as',nliteral(name))
into :varlist separated by ','
from dictionary.columns
where libname='WORK' and memname='ONE' and lowcase(name) ne 'period'
;
create table miss_count as select period,count(*) as nobs,&varlist
from one
group by period
;
quit;
Results:
period nobs a b c d e
402 4 2 2 1 1 0
403 3 0 0 0 2 0
It is much easier to find this information in sql;
proc sql;
select sum(a is not missing) as fil_a
, sum(a is missing) as mis_a
, count(*) as tot_a
from one
where period eq 402;
quit;
You can even 0handle all periods at once using group by.
There are a few ways to make this work for all variables in a dataset (except for some group by variables). For instance:
%macro count_missing();
proc sql;
select count(*), name
into :no_var, :var_list separated by ' '
from sasHelp.vcolumn
where libName eq 'WORK' and memName eq 'ONE' and upcase(name) ne 'PERIOD';
create view count_missing as
select count(*) as total
%do var_nr = 1 %to &no_var;
%let var = %scan(&var_list, &var_nr);
, sum(&var is missing) as mis_&var
%end;
from work.one
group by period;
quit;
data report_missing;
set count_missing;
format count_of $32.;
count_of = 'missing';
%do var_nr = 1 %to &no_var;
%let var = %scan(&var_list, &var_nr);
&var = mis_&var;
%end;
output;
count_of = 'non missing';
%do var_nr = 1 %to &no_var;
%let var = %scan(&var_list, &var_nr);
&var = total - mis_&var;
%end;
output;
count_of = 'total';
%do var_nr = 1 %to &no_var;
%let var = %scan(&var_list, &var_nr);
&var = total;
%end;
output;
end;
%mend;
%count_missing();
You don't need iml to summarize data over observations. You can do that with a retain statement too. Moreover, using by processing with first and last, you can process all periods in one go.
data final;
set one;
by period;
if first.period then do;
mis_a = 0;
total = 0;
end;
retain mis_a;
if missing(a) then mis_a +=1; else fil_a += 1;
total += 1;
if last.period;
fil_a = total - mis_a;
end;
This is by far the fastest way to handle a big dataset if the data is sorted by period.
To make it work for a set of variables not known upfront, you can apply the same techniques as in my other solution.

SAS: iterate from beginning to end date in a macro

I have a dataset like this:
DATA tmp;
INPUT
identifier $
d0101 d0102 d0103 d0104 d0105 d0106
d0107 d0108 d0109 d0110 d0111 d0112
;
DATALINES;
a 1 2 3 4 5 6 7 8 9 10 11 12
b 4 5 7 4 5 6 7 6 9 10 3 12
c 5 2 3 5 5 4 7 8 3 1 1 2
;
RUN;
And I'm trying to create a dataset like this:
DATA tmp;
INPUT
identifier $ day value
;
DATALINES;
a '01JAN2018'd 1
a '02JAN2018'd 2
a '03JAN2018'd 3
a '04JAN2018'd 4
a '05JAN2018'd 5
a '06JAN2018'd 6
a '07JAN2018'd 7
a '08JAN2018'd 8
a '09JAN2018'd 9
a '10JAN2018'd 10
a '11JAN2018'd 11
a '12JAN2018'd 12
b '01JAN2018'd 4
b '02JAN2018'd 5
b '03JAN2018'd 7
...
;
RUN;
I know the syntax for "melting" a dataset like this - I have completed a similar macro for columns that represent a particular value in each of the twelve months in a year.
What I'm struggling with is how to iterate through all days year-to-date (the assumption is that the have dataset has all days YTD as columns).
I'm used to Python, so something I might do there would be:
>>> import datetime
>>>
>>> def dates_ytd():
... end_date = datetime.date.today()
... start_date = datetime.date(end_date.year, 1, 1)
... diff = (end_date - start_date).days
... for x in range(0, diff + 1):
... yield end_date - datetime.timedelta(days=x)
...
>>> def create_date_column(dt):
... day, month = dt.day, dt.month
... day_fmt = '{}{}'.format('0' if day < 10 else '', day)
... month_fmt = '{}{}'.format('0' if month < 10 else '', month)
... return 'd{}{}'.format(month_fmt, day_fmt)
...
>>> result = [create_date_column(dt) for dt in dates_ytd()]
>>>
>>> result[:5]
['d1031', 'd1030', 'd1029', 'd1028', 'd1027']
>>> result[-5:]
['d0105', 'd0104', 'd0103', 'd0102', 'd0101']
Here is my SAS attempt:
%MACRO ITER_DATES_YTD();
DATA _NULL_;
%DO v_date = '01012018'd %TO TODAY();
%PUT d&v_date.;
* Will do "melting" logic here";
%END
%MEND ITER_DATES_YTD;
When I run this, using %ITER_DATES_YTD();, nothing is even printed to my log. What am I missing here? I basically want to iterate through "YTD" columns, like these d0101, d0102, d0103, ....
This is more a transposition problem than a macro / data step problem.
The core problem is that you have data in the metadata, meaning the 'date' is encoded in the column names.
Example 1:
Transpose the data, then use the d<yymm> _name_ values to compute an actual date.
proc transpose data=have out=have_t(rename=col1=value);
by id;
run;
data want (keep=id date value);
set have_t;
* convert the variable name has day-in-year metadata into some regular data;
date = input (cats(year(today()),substr(_name_,2)),yymmdd10.);
format date yymmdd10.;
run;
Example 2:
Do an array based transposition. The D<mm><dd> variables are being used in a role of value_at_date, and are easily arrayed due to a consistent naming convention. The VNAME function extricates the original variable name from the array reference and computes a date value from the <mm><dd> portion
data want;
set have;
array value_at_date d:;
do index = 1 to dim(value_at_date);
date = input(cats(year(today()),substr(VNAME(value_at_date(index)),2)), yymmdd10.);
value = value_at_date(index);
output;
end;
format date yymmdd10.;
keep id date value;
run;
To iterate through dates, you have to convert it to numbers first and then extract date part from it.
%macro iterateDates();
data _null_;
%do i = %sysFunc(inputN(01012018,ddmmyy8.)) %to %sysFunc(today()) %by 1;
%put d%sysFunc(putN(&i, ddmmyy4.));
%end;
run;
%mend iterateDates;
%iterateDates();
I think that '01012018'd is processed only in data step, but not in the macro code. And keep in mind, that macro code is executed first and only then the data step is executed. You can think about it like building SAS code with SAS macros and then running it.

Creating variables that count the "levels" of other variables

I have a dataset analogous to the simplified table below (let's call it "DS_have"):
SurveyID Participant FavoriteColor FavoriteFood SurveyMonth
S101 G92 Blue Pizza Jan
S102 B34 Blue Cake Feb
S103 Z28 Green Cake Feb
S104 V11 Red Cake Feb
S105 P03 Yellow Pizza Mar
S106 A71 Red Pizza Mar
S107 C48 Green Cake Mar
S108 G92 Blue Cake Apr
...
I'd like to create a set of numeric variables that identify the discrete categories/levels of each variable in the dataset above. The result should look like the following dataset ("DS_want"):
SurveyID Participant FavoriteColor FavoriteFood SurveyMonth ColorLevels FoodLevels ParticipantLevels MonthLevels
S101 G92 Blue Pizza Jan 1 1 1 1
S102 B34 Blue Cake Feb 1 2 2 2
S103 Z28 Green Cake Feb 2 2 3 2
S104 V11 Red Cake Feb 3 2 4 2
S105 P03 Yellow Pizza Mar 4 1 5 3
S106 A71 Red Pizza Mar 3 1 6 3
S107 C48 Green Cake Mar 2 2 7 3
S108 G92 Blue Cake Apr 1 1 1 4
...
Essentially, I want to know what syntax I should use to generate unique numerical values for each "level" or category of variables in the DS_Have dataset. Note that I cannot use conditional if/then statements to create the values in the ":Levels" variables for each category, as the number of levels for some variables is in the thousands.
One straightforward solution is to use proc tabulate to generate a tabulated list, then iterate over that and create informats to convert the text to a number; then you just use input to code them.
*store variables you want to work with in a macro variable to make this easier;
%let vars=FavoriteColor FavoriteFood SurveyMonth;
*run a tabulate to get the unique values;
proc tabulate data=have out=freqs;
class &vars.;
tables (&vars.),n;
run;
*if you prefer to have this in a particular order, sort by that now - otherwise you may have odd results (as this will). Sort by _TYPE_ then your desired order.;
*Now create a dataset to read in for informat.;
data for_fmt;
if 0 then set freqs;
array vars &vars.;
retain type 'i';
do label = 1 by 1 until (last._type_); *for each _type_, start with 1 and increment by 1;
set freqs;
by _type_ notsorted;
which_var = find(_type_,'1'); *parses the '100' value from TYPE to see which variable this row is doing something to. May not work if many variables - need another solution to identify which (depends on your data what works);
start = coalescec(vars[which_var]);
fmtname = cats(vname(vars[which_var]),'I');
output;
if first._type_ then do; *set up what to do if you encounter a new value not coded - set it to missing;
hlo='o'; *this means OTHER;
start=' ';
label=.;
output;
hlo=' ';
label=1;
end;
end;
run;
proc format cntlin=for_fmt; *import to format catalog via PROC FORMAT;
quit;
Then code them like this (you might create a macro to do this looping over the &vars macro variable).
data want;
set have;
color_code = input(FavoriteColor,FavoriteColorI.);
run;
Another approach - create a hash object to keep track of the levels encountered for each variable, and read the dataset twice via a double DOW-loop, applying the level numbers on the second pass. It's perhaps not as elegant as Joe's solution, but it should use slightly less memory and I suspect it will scale to a somewhat larger number of variables.
%macro levels_rename(DATA,OUT,VARS,NEWVARS);
%local i NUMVARS VARNAME;
data &OUT;
if 0 then set &DATA;
length LEVEL 8;
%let i = 1;
%let VARNAME = %scan(&VARS,&i);
%do %while(&VARNAME ne );
declare hash h&i();
rc = h&i..definekey("&VARNAME");
rc = h&i..definedata("LEVEL");
rc = h&i..definedone();
%let i = %eval(&i + 1);
%let VARNAME = %scan(&VARS,&i);
%end;
%let NUMVARS = %eval(&i - 1);
do _n_ = 1 by 1 until(eof);
set &DATA end = eof;
%do i = 1 %to &NUMVARS;
LEVEL = h&i..num_items + 1;
rc = h&i..add();
%end;
end;
do _n_ = 1 to _n_;
set &DATA;
%do i = 1 %to &NUMVARS;
rc = h&i..find();
%scan(&NEWVARS,&i) = LEVEL;
%end;
output;
end;
drop LEVEL;
run;
%mend;
%levels_rename(sashelp.class,class_renamed,NAME SEX, NAME_L SEX_L);

How do i perform calculation about the last n observations

how can i perform calculation for the last n observation in a data set
For example if I have 10 observations I would like to create a variable that would sum the last 5 values of another variable. Please do not suggest that I lag 5 times or use module ( N ). I need a bit more elegant solution than that.
with the code below alpha is the data set that i have and bravo is the one i need.
data alpha;
input lima ## ;
cards ;
3 1 4 21 3 3 2 4 2 5
;
run ;
data bravo;
input lima juliet;
cards;
3 .
1 .
4 .
21 .
3 32
3 32
2 33
4 33
2 14
5 16
;
run;
thank you in advance!
You can do this in the data step or using PROC EXPAND from SAS/ETS if available.
For the data step the idea is that you start with a cumulative sum (summ), but keep track of the number of values that were added so far (ninsum). Once that reaches 5, you start outputting the cumulative sum to the target variable (juliet), and from the next step you start subtracting the lagged-5 value to only store the sum of the last five values.
data beta;
set alpha;
retain summ ninsum 0;
summ + lima;
ninsum + 1;
l5 = lag5(lima);
if ninsum = 6 then do;
summ = summ - l5;
ninsum = ninsum - 1;
end;
if ninsum = 5 then do;
juliet = summ;
end;
run;
proc print data=beta;
run;
However there is a procedure that can do all kind of cumulative, moving window, etc calculations: PROC EXPAND, in which this is really just one line. We just tell it to calculate the backward moving sum in a window of width 5 and set the first 4 observations to missing (by default it will expand your series by 0's on the left).
proc expand data=alpha out=gamma;
convert lima = juliet / transformout=(movsum 5 trimleft 4);
run;
proc print data=gamma;
run;
Edit
If you want to do more complicated calculations, you need to carry the previous values in retained variables. I thought you wanted to avoid that, but here it is:
data epsilon;
set alpha;
array lags {5};
retain lags1 - lags5;
/* do whatever calculation is needed */
juliet = 0;
do i=1 to 5;
juliet = juliet + lags{i};
end;
output;
/* shift over lagged values, and add self at the beginning */
do i=5 to 2 by -1;
lags{i} = lags{i-1};
end;
lags{1} = lima;
drop i;
run;
proc print data=epsilon;
run;
I can offer rather ugly solution:
run data step and add increasing number to each group.
run sql step and add column of max(group).
run another data step and check if value from (2)-(1) is less than 5. If so, assign to _num_to_sum_ variable (for example) the value that you want to sum, otherwise leave it blank or assign 0.
and last do a sql step with sum(_num_to_sum_) and group results by grouping variable from (1).
EDIT: I have added a live example of the concept in a bit more compacted way.
input var1 $ var2;
cards;
aaa 3
aaa 5
aaa 7
aaa 1
aaa 11
aaa 8
aaa 6
bbb 3
bbb 2
bbb 4
bbb 6
;
run;
data step1;
set sourcetable;
by var1;
retain obs 0;
if first.var1 then obs = 0;
else obs = obs+1;
if obs >=5 then to_sum = var2;
run;
proc sql;
create table rezults as
select distinct var1, sum(to_sum) as needed_summs
from step1
group by var1;
quit;
In case anyone reads this :)
I solved it the way I needed it to be solved. Although now I am more curious which of the two(the retain and my solution) is more optimal in terms of computing/processing time.
Here is my solution:
data bravo(keep = var1 summ);
set alpha;
do i=_n_ to _n_-4 by -1;
set alpha(rename=var1=var2) point=i;
summ=sum(summ,var2);
end;
run;