SAS: sum across columns and set to missing if any missing - sas

Is there a nice way to sum across columns, but set the answer to missing if any are missing? Effectively the following:
if cmiss(of eTimeOffWork -- eNotAbleToDoJob) = 0
then work = sum(of eTimeOffWork -- eNotAbleToDoJob);
Is there function like sum that does this out of the box?

Standard addition will do this, but you will have to specify every variable:
foo = var1 + var2 + var3 + ... + varn
The way you're doing it is the easiest way.
You could turn it into your own function if you wanted to and make it a bit cleaner in the data step code. It gets tricky with fcmp because you must provide the variables to sum in an array. I wouldn't recommend doing this but this is an option available to you.
An example that does this is provided for you in the SAS documentation:
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/p0f3ukbprxtdfrn1cljibie17wzn.htm
proc fcmp outlib=work.funcs.funcs;
function summiss(n[*]) varargs;
sum = 0;
do i = 1 to dim(n);
sum = sum + n[i];
end;
return(sum);
endfunc;
run;
options cmplib=work.funcs;
data test;
array vars[5] (1, 2, 3, 4, .);
foo = summiss(vars);
run;

Related

Assign values to a variable with do loop in SAS

I have a dataset with column names payment_201601, payment_201602, ..., payment_202112.
I would like to convert these columns into two columns, payments and paymonth.
This is what I came up with:
*ARRAY payment {12} payment_201601-payment_201612;
DO i = 1 TO 12;
IF payment{i} > 0 THEN DO;
payment=payment{i};
paymonth=201600+&i;
END;
END;*
However, the code only worked for payment but not for paymonth. The paymonth variable contains only missing values in the output.
Grateful for any help or a better way to solve this problem.
Welcome :-)
There are several reasons why your code does not work. The main thing is that you can not mix data step and macro code in this way. Macro code runs before you data step even compiles.
Instead, use all Data Step code and do something like this. I assume that you want Paymonth to be an actual date variable. I just made up some sample data.
Feel free to ask.
data have;
array p payment_201601 - payment_201612;
do _N_ = 1 to 10;
do over p;
p = rand('integer', 1, 100);
end;
output;
end;
run;
data want;
set have;
array p payment_201601 - payment_201612;
do over p;
payment = p;
paymonth = input(compress(vname(p),, 'kd'), yymmn6.);
output;
end;
keep payment paymonth;
format paymonth yymmn6.;
run;

How to store and pass on my SAS result value?

I would like to make decisions based on a dataset like this:
data cat;
input type ind;
datalines;
1 0
2 0
3 1
;
run;
The decision criterion are: if the minimum of ind is 0, then do action A; if the number of observations is 2, then do action B.
proc means data=cat N min;
var ind;
run;
Now I have printed out the N and min, which are what I want. But how to extract these values? In R, I can just use $, but in SAS it seems that I can only print them out in a table and store them as a dataset, not an independent variable.
Also, better not to use sql.
Thanks to #Reeza, I got my problem solved using macro symput.
proc means data=cat N min;
var ind;
output out=decider N=ntotal min=minimum;
run;
data _null_;
set decider;
call symput('ntotal', ntotal);
call symput('minimum',minimum);
run;

cotegorization of a numeric variable in SAS

I want to find the way to build another variable (it's ok even in the same dataset) that is the categorization of the old variable. I would choose the number of the buckets (for exemples using percentiles as cutoffs: p10, p20, p30, etc.).
Now I do this thing extracting the percentiles of the variable with proc univariate. But this give me only the percentiles (my cutoffs) and then I have to build the new variable manually using the percentiles.
How can I create this new variable giving the cutoffs and the number of buckets as input?
thanks in advance
Assuming you want equal percentage sized buckets, then PROC RANK might just get you want you are looking for.
data test;
do i=1 to 100;
output;
end;
run;
proc rank data=test out=test2 groups=5;
var i;
ranks grp;
run;
That will give you 5 groups (named 0 .. 4), which should be equivalent to P20, P40, ..., P80 cutoffs.
If you wanted non-equal buckets, ie P10, P40, P60, and P90, then you would have to choose the lowest level and combine groups. Using the groups above:
%let groups=10;
proc rank data=test out=test2 groups=&groups;
var var;
ranks grp;
run;
/*
P = (grp+1)*&groups
Cutoffs 10, 40, 60, 90
implicit 5 new groups
*/
%let n_cutoff=4;
%let cutoffs=10, 40, 60, 90;
data test3(drop=_i cutoffs:);
set test2;
array cutoffs[&n_cutoff] (&cutoffs);
P = (grp+1)*&groups;
do _i=1 to &n_cutoff;
if P <= cutoffs[_i] then do;
new_grp = _i-1;
leave;
end;
if _i = &n_cutoff then
new_grp = _i;
end;
run;
10 is the lowest common denominator of the P values. 100/10 = 10 so we need 10 groups from PROC RANK.
The Data Step at the end combines the groups using the cutoffs you are looking for.

How to sum a variable and record the total in the last row using SAS

I have a dataset looks like the following:
Name Number
a 1
b 2
c 9
d 6
e 5.5
Total ???
I want to calculate the sum of variable Number and record the sum in the last row (corresponding with Name = 'total'). I know I can do this using proc means then merge the output backto this file. But this seems not very efficient. Can anyone tell me whether there is any better way please.
you can do the following in a dataset:
data test2;
drop sum;
set test end = last;
retain sum;
if _n_ = 1 then sum = 0;
sum = sum + number;
output;
if last then do;
NAME = 'TOTAL';
number = sum;
output;
end;
run;
it takes just one pass through the dataset
It is easy to get by report procedure.
data have;
input Name $ Number ;
cards;
a 1
b 2
c 9
d 6
e 5.5
;
proc report data=have out=want(drop=_:);
rbreak after/ summarize ;
compute after;
name='Total';
endcomp;
run;
The following code uses the DOW-Loop (DO-Whitlock) to achieve the result by reading through the observations once, outputting each one, then lastly outputting the total:
data want(drop=tot);
do until(lastrec);
set have end=lastrec;
tot+number;
output;
end;
name='Total';
number=tot;
output;
run;
For all of the data step solutions offered, it is important to keep in mind the 'Length' factor. Make sure it will accommodate both 'Total' and original values.
proc sql;
select max(5,length) into :len trimmed from dictionary.columns WHERE LIBNAME='WORK' AND MEMNAME='TEST' AND UPCASE(NAME)='NAME';
QUIT;
data test2;
length name $ &len;
set test end=last;
...
run;

How to calculate a mean for the non zero values using proc means or proc summary

I want to have a mean which is based in non zero values for given variables using proc means only.
I know we do can calculate using proc sql, but I want to get it done through proc means or proc summary.
In my study I have 8 variables, so how can I calculate mean based on non zero values where in I am using all of those in the var statement as below:
proc means = xyz;
var var1 var2 var3 var4 var5 var6 var7 var8;
run;
If we take one variable at a time in the var statement and use a where condition for non zero variables , it works but can we have something which would work for all the variables of interest mentioned in the var statement?
Your suggestions would be highly appreciated.
Thank you !
One method is to change all of your zero values to missing, and then use PROC MEANS.
data zeromiss /view=zeromiss ;
set xyz ;
array n{*} var1-var8 ;
do i = 1 to dim(n) ;
if n{i} = 0 then call missing(n{i}) ;
end ;
drop i ;
run ;
proc means data=zeromiss ;
var var1-var8 ;
run ;
Create a view of your input dataset. In the view, define a weight variable for each variable you want to summarise. Set the weight to 0 if the corresponding variable is 0 and 1 otherwise. Then do a weighted summary via proc means / proc summary. E.g.
data xyz_v /view = xyz_v;
set xyz;
array weights {*} weight_var1-weight_var8;
array vars {*} var1-var8;
do i = 1 to dim(vars);
weights[i] = (vars[i] ne 0);
end;
run;
%macro weighted_var(n);
%do i = 1 to &n;
var var&i /weight = weight_var&i;
%end;
%mend weighted_var;
proc means data = xyz_v;
%weighted_var(8);
run;
This is less elegant than Chris J's solution for this specific problem, but it generalises slightly better to other situations where you want to apply different weightings to different variables in the same summary.
Can't you use a data statement?
data lala;
set xyz;
drop qty;
mean = 0;
qty = 0;
if(not missing(var1) and var1 ^= 0) then do;
mean + var1;
qty + 1;
end;
if(not missing(var2) and var2 ^= 0) then do;
mean + var2;
qty + 1;
end;
/* ... repeat to all variables ... */
if(not missing(var8) and var8 ^= 0) then do;
mean + var8;
qty + 1;
end;
mean = mean/qty;
run;
If you want to keep the mean in the same xyz dataset, just replace lala with xyz.