Add observation to the end of the table SAS - sas

Let say I have a table like:
Z 25 26 27 ... 100
0 300 200 200 100
1 278 262 177 45
2 168 222 122 22
(The 1st line is also the header).
Now I want to add more 20 observations in my table:
Z 25 26 27 ... 100
0 300 200 200 100
1 278 262 177 45
2 168 222 122 22
3 84 111 61 11
...
22 84 111 61 11
So that (all observation with Z=3 to 22) = (observation with Z = 2) * 1/2. Is there anyway to do that?

The special variable name list _numeric_ is used to array all the numeric variables. A loop over that array will let you divide each variable of a selected row by 2.
Example:
data have;
input Z _25 _26 _27 _100;
datalines;
0 300 200 200 100
1 278 262 177 45
2 168 222 122 22
run;
data newrows(drop=last_z);
set have nobs=nobs point=nobs; * read last row;
last_z = z;
array _ _numeric_; * array all numeric variables;
do _n_ = 1 to dim(_);
_(_n_) = _(_n_) / 2; * divide each variable by 2;
end;
do z = last_z + 1 to last_z + 20; * output 20 'new' rows;
output;
end;
stop;
run;
proc append base=have data=newrows;
run;

Just to be clear, a SAS variable name can not be a number. However, this gives you what you want
data have;
input z a b c;
datalines;
0 300 200 200
1 278 262 177
2 168 222 122
;
data want;
set have end=lr;
array arr a--c;
output;
if lr;
do over arr;
arr = arr / 2;
end;
do _N_ = 1 to 20;
z + 1;
output;
end;
run;
Updated Code:
data have;
do z = 0, 1, 2;
array arr _25-_100;
do over arr;
arr = ceil(rand('uniform')*100);
end;
output;
end;
run;
data want;
set have end=lr;
array arr _25--_100;
output;
if lr;
do over arr;
arr = arr / 2;
end;
do _N_ = 1 to 20;
z + 1;
output;
end;
run;

Related

Average over number of variables where number of variables is dictated by separate column

I would like to create a new column whose values equal the average of values in other columns. But the number of columns I am taking the average of is dictated by a variable. My data look like this, with 'length' dictating the number of columns x1-x5 that I want to average:
data have;
input ID $ length x1 x2 x3 x4 x5;
datalines;
A 5 8 234 79 36 78
B 4 8 26 589 3 54
C 3 19 892 764 89 43
D 5 72 48 65 4 9
;
run;
I would like to end up with the below where 'avg' is the average of the specified columns.
data want;
input ID $ length avg
datalines;
A 5 87
B 4 156.5
C 3 558.3
D 5 39.6
;
run;
Any suggestions? Thanks! Sorry about the awful title, I did my best.
You have to do a little more work since mean(of x[1]-x[length]) is not valid syntax. Instead, save the values to a temporary array and take the mean of it, then reset it at each row. For example:
tmp1 tmp2 tmp3 tmp4 tmp5
8 234 79 36 78
8 26 589 3 .
19 892 764 . .
72 48 65 4 9
data want;
set have;
array x[*] x:;
array tmp[5] _temporary_;
/* Reset the temp array */
call missing(of tmp[*]);
/* Save each value of x to the temp array */
do i = 1 to length;
tmp[i] = x[i];
end;
/* Get the average of the non-missing values in the temp array */
avg = mean(of tmp[*]);
drop i;
run;
Use an array to average it by summing up the array for the length and then dividing by the length.
data have;
input ID $ length x1 x2 x3 x4 x5;
datalines;
A 5 8 234 79 36 78
B 4 8 26 589 3 54
C 3 19 892 764 89 43
D 5 72 48 65 4 9
;
data want;
set have;
array x(5) x1-x5;
sum=0;
do i=1 to length;
sum + x(i);
end;
avg = sum/length;
keep id length avg;
format avg 8.2;
run;
#Reeza's solution is good, but in case of missing values in x it will produce not always desirable result. It's better to use a function SUM. Also the code is little simplified:
data want (drop=i s);
set have;
array a{*} x:;
s=0; nm=0;
do i=1 to length;
if missing(a{i}) then nm+1;
s=sum(s,a{i});
end;
avg=s/(length-nm);
run;
Rather than writing your own code to calculate means you could just calculate all of the possible means and then just use an index into an array to select the one you need.
data have;
input ID $ length x1 x2 x3 x4 x5;
datalines;
A 5 8 234 79 36 78
B 4 8 26 589 3 54
C 3 19 892 764 89 43
D 5 72 48 65 4 9
;
data want;
set have;
array means[5] ;
means[1]=x1;
means[2]=mean(x1,x2);
means[3]=mean(of x1-x3);
means[4]=mean(of x1-x4);
means[5]=mean(of x1-x5);
want = means[length];
run;
Results:

how SAS create an external report(.xlsx file) with filter summation

I have a data for sales in 3 months (sale1, sale2 and sale3), and I need to show the the different summations with different filters.
data sales;
input area load $ prod : $ sale1 sale2 sale3;
diff=sale3-sale2;
datalines;
1 Y p1 109 117 138
1 N p1 23 29 20
1 Y p2 78 70 68
1 N p2 63 19 22
2 Y p1 49 36 32
2 N p1 50 39 44
2 Y p3 138 157 158
2 N p3 110 126 107
3 Y p2 251 267 259
3 N p2 182 184 160
;
run;
ods excel close;
ods excel file="/C:/data/t1.xlsx"
options (sheet_name="tab1" frozen_headers='3' frozen_rowheaders='2'
embedded_footnotes='yes' autofilter='1-8');
proc report data=sales nocenter;
column area load prod sale1 sale2 sale3 diff change;
define area -- diff/ display;
define sale1-- diff / analysis sum format=comma12. style(column)=[cellwidth=.5in];
define change / computed format=percent8.2 '% change' style(column)=[cellwidth=.8in];
compute change;
change = diff.sum/sale2.sum;
if change >= 0.1 then call define ("change",'STYLE','STYLE=[color=red
fontweight=bold]');
if change <= -0.1 then call define ("change",'STYLE','STYLE=[color=blue
fontweight=bold]');
endcomp;
rbreak after / summarize style=[background=lightblue font_weight=bold];
run;
ods excel close;
this report with no filtering looks likeoriginal report
but if I filter with column load='Y' in the .xlsx file, i want to see the result like this:
output with filter
wonder if anyone can help, thanks!

Use the dif function to obtain the difference with several lags without specifying the number of lags

I want a new data set in which the variable y is equal to the value in the n row minus the lags values.
The original data set:
data test;
input x;
datalines;
20
40
2
5
74
;
run;
I used the dif function, but It returns the difference with a one lag:
data want;
set test;
y = dif(x);
run;
And I want:
_n_ = 1 y = 20
_n_ = 2 y = 40 - 20 = 20
_n_ = 3 y = 2 - (40 + 20) = -58
_n_ = 4 y = 5 - (2 + 40 + 20) = - 57
_n_ = 5 y = 74 - (5 + 2 + 40 + 20) = 7
Thanks.
No need for lag() or dif(). Just make another variable to retain the running total.
data want ;
set test;
y=x-cumm;
output;
cumm+x;
run;
I kept the extra column and output the values before updating the running total to make it clearer what value was used in the calculation of Y.
Obs x y cumm
1 20 20 0
2 40 20 20
3 2 -58 60
4 5 -57 62
5 74 7 67
Possible solution (thanks to Longfish for suggestions):
data want;
set test;
retain total 0;
total = total + x;
y = x - coalesce(lag(total), 0);
run;

SAS proc optmodel: trying to find the optimal cut-off

I'm new to proc optmodel and would appreciate any help to solve the problem at hand.
Here's my problem:
My dataset is like below:
data my data;
input A B C;
cards;
0 240 3
3.4234 253 2
0 258 7
0 272 4
0 318 7
0 248 8
0 260 2
0.2555 305 5
0 314 5
1.7515 235 7
32 234 4
0 301 3
0 293 5
0 302 12
0 234 2
0 258 4
0 289 2
0 287 10
0 313 3
0.7725 240 7
0 268 3
1.4411 286 9
0 234 13
0.0474 318 2
0 315 4
0 292 5
0.4932 272 3
0 288 4
0 268 4
0 284 6
0 270 4
50.9188 293 3
0 272 3
0 284 2
0 307 3
;
run;
There are 3 variables(A,B,C) and I want to classify observations into three classes (H,M,L) based on these 3 variables.
For class H, I want to maximize A, minimize B and C;
For class M, I want to median A,B and C;
For class L, I want to minimize A, maximize B and C.
Also, the constrain is that I want to limit the total observations classified into H less than 5%, and total observations classified into M less than 7%.
The final target is finding the cut-off of A,B,C for classifying obs into three different classes.
Since the three classes are equally weighted,so I scaled the vars first and create a risk var where risk = A+(1-B)+(1-C);
Thanks in advance for any help.
my sas code:
proc stdize data=my_data out=my_data1 method=RANGE;
var A B C;
run;
data new;
set my_data1;
risk = A+(1-B)+(1-C);
run;
proc sort data=new out=range;
by risk;
run;
proc optmodel;
/* read data */
set CUTOFF;
/* str risk_level {CUTOFF}; */
num a {CUTOFF};
num b {CUTOFF};
num c {CUTOFF};
read data my_data1 into CUTOFF=[_n_] a=A b=B c=C;
impvar risk{p in CUTOFF} = a[p]+(1-b[p])+(1-c[p]);
var indh {CUTOFF} binary;
var indmh {CUTOFF} binary;
var indo {CUTOFF} binary;
con sum{p in CUTOFF} indh[p] le 10;
con sum{p in CUTOFF} indmh[p] le 6;
con sum{p in CUTOFF} indo[p] le 19;
con class{p in CUTOFF}:indh[p]+indmh[p]+indo[p] le 1;
max new = sum{p in CUTOFF}(10*indh[p]+4*indmh[p]+indo[p])*risk[p];
solve;
print a b c risk indh indmh indo new;
quit;
So now my problem is how to find the min risk value in each class,Thanks!

SAS Finding Median without PROC MEANS

I have a dataset (already sorted by the Blood Pressure variable)
Blood Pressure
87
99
99
109
111
112
117
119
121
123
139
143
145
151
165
198
I need to find the median without using proc means.
Now For this data, there are 16 observations. The median is (119+121)/2 = 120.
How can I code so that I would always be able to find the median, regardless of how many observations there are. Code that would work for even number of observations and odd number of observations.
And of course, PROC means is not allowed.
Thank you.
I use a FCMP function for this. This is a generic quantile function from my personal library. As the median is the 50%-tile, this will work.
options cmplib=work.fns;
data input;
input BP;
datalines;
87
99
99
109
111
112
117
119
121
123
139
143
145
151
165
198
;run;
proc fcmp outlib=work.fns.fns;
function qtile_n(p, arr[*], n);
alphap=1;
betap=1;
if n > 1 then do;
m = alphap+p*(1-alphap-betap);
i = floor(n*p+m);
g = n*p + m - i;
qp = (1-g)*arr[i] + g*arr[i+1];
end;
else
qp = arr[1];
return(qp);
endsub;
quit;
proc sql noprint;
select count(*) into :n from input;
quit;
data _null_;
set input end=last;
array v[&n] _temporary_;
v[_n_] = bp;
if last then do;
med = qtile_n(.5,v,&n);
put med=;
end;
run;
Assuming you have a data set named HAVE sorted by the variable BP, you can try this:
data want(keep=median);
if mod(nobs,2) = 0 then do; /* even number if records in data set */
j = nobs / 2;
set HAVE(keep=bp) point=j nobs=nobs;
k = bp; /* hold value in temp variable */
j + 1;
set HAVE(keep=bp) point=j nobs=nobs;
median = (k + bp) / 2;
end;
else do;
j = round( nobs / 2 );
set HAVE(keep=bp) point=j nobs=nobs;
median = bp;
end;
put median=; /* if all you want is to see the result */
output; /* if you want it in a new data set */
stop; /* stop required to prevent infinite loop */
run;
This is "old fashioned" code; I'm sure someone can show another solution using hash objects that might eliminate the requirement to sort the data first.