Saving results from SAS proc freq with multiple tables - sas

I'm a beginner in SAS and I have the following problem.
I need to calculate counts and percents of several variables (A B C) from one dataset and save the results to another dataset.
my code is:
proc freq data=mydata;
tables A B C / out=data_out ; run;
the result of the procedure for each variable appears in the SAS output window, but data_out contains the results only for the last variable. How to save them all in data_out?
Any help is appreciated.

ODS OUTPUT is your answer. You can't output directly using the OUT=, but you can output them like so:
ods output OneWayFreqs=freqs;
proc freq data=sashelp.class;
tables age height weight;
run;
ods output close;
OneWayFreqs is the one-way tables, (n>1)-way tables are CrossTabFreqs:
ods output CrossTabFreqs=freqs;
ods trace on;
proc freq data=sashelp.class;
tables age*height*weight;
run;
ods output close;
You can find out the correct name by running ods trace on; and then running your initial proc whatever (to the screen); it will tell you the names of the output in the log. (ods trace off; when you get tired of seeing it.)

Lots of good basic sas stuff to learn here
1) Run three proc freq statements (one for each variable a b c) with a different output dataset name so the datasets are not over written.
2) use a rename option on the out = statement to change the count and percent variables for when you combine the datasets
3) sort by category and merge all datasets together
(I'm assuming there are values that appear in in multiple variables, if not you could just stack the data sets)
data mydata;
input a $ b $ c$;
datalines;
r r g
g r b
b b r
r r r
g g b
b r r
;
run;
proc freq noprint data = mydata;
tables a / out = data_a
(rename = (a = category count = count_a percent = percent_a));
run;
proc freq noprint data = mydata;
tables b / out = data_b
(rename = (b = category count = count_b percent = percent_b));
run;
proc freq noprint data = mydata;
tables c / out = data_c
(rename = (c = category count = count_c percent = percent_c));
run;
proc sort data = data_a; by category; run;
proc sort data = data_b; by category; run;
proc sort data = data_c; by category; run;
data data_out;
merge data_a data_b data_c;
by category;
run;

As ever, there are lots of different ways of doing this sort of thing in SAS. Here are a couple of other options:
1. Use proc summary rather than proc freq:
proc summary data = sashelp.class;
class age height weight;
ways 1;
output out = freqs;
run;
2. Use multiple table statements in a single proc freq
This is more efficient than running 3 separate proc freq statements, as SAS only has to read the input dataset once rather than 3 times:
proc freq data = sashelp.class noprint;
table age /out = freq_age;
table height /out = freq_height;
table weight /out = freq_weight;
run;
data freqs;
informat age height weight count percent;
set freq_age freq_height freq_weight;
run;

This is a question I've dealt with many times and I WISH SAS had a better way of doing this.
My solution has been a macro that is generalized, provide your input data, your list of variables and the name of your output dataset. I take into consideration the format/type/label of the variable which you would have to do
Hope it helps:
https://gist.github.com/statgeek/c099e294e2a8c8b5580a
/*
Description: Creates a One-Way Freq table of variables including percent/count
Parameters:
dsetin - inputdataset
varlist - list of variables to be analyzed separated by spaces
dsetout - name of dataset to be created
Author: F.Khurshed
Date: November 2011
*/
%macro one_way_summary(dsetin, varlist, dsetout);
proc datasets nodetails nolist;
delete &dsetout;
quit;
*loop through variable list;
%let i=1;
%do %while (%scan(&varlist, &i, " ") ^=%str());
%let var=%scan(&varlist, &i, " ");
%put &i &var;
*Cross tab;
proc freq data=&dsetin noprint;
table &var/ out=temp1;
run;
*Get variable label as name;
data _null_;
set &dsetin (obs=1);
call symput('var_name', vlabel(&var.));
run;
%put &var_name;
*Add in Variable name and store the levels as a text field;
data temp2;
keep variable value count percent;
Variable = "&var_name";
set temp1;
value=input(&var, $50.);
percent=percent/100; * I like to store these as decimals instead of numbers;
format percent percent8.1;
drop &var.;
run;
%put &var_name;
*Append datasets;
proc append data=temp2 base=&dsetout force;
run;
/*drop temp tables so theres no accidents*/
proc datasets nodetails nolist;
delete temp1 temp2;
quit;
*Increment counter;
%let i=%eval(&i+1);
%end;
%mend;
%one_way_summary(sashelp.class, sex age, summary1);
proc report data=summary1 nowd;
column variable value count percent;
define variable/ order 'Variable';
define value / format=$8. 'Value';
define count/'N';
define percent/'Percentage %';
run;
EDIT (2022):
Better way of doing this is to use the ODS Tables:
/*This code is an example of how to generate a table with
Variable Name, Variable Value, Frequency, Percent, Cumulative Freq and Cum Pct
No macro's are required
Use Proc Freq to generate the list, list variables in a table statement if only specific variables are desired
Use ODS Table to capture the output and then format the output into a printable table.
*/
*Run frequency for tables;
ods table onewayfreqs=temp;
proc freq data=sashelp.class;
table sex age;
run;
*Format output;
data want;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
keep variable variable_value frequency percent cum:;
label variable='Variable'
variable_value='Variable Value';
run;
*Display;
proc print data=want(obs=20) label;
run;

The option STACKODS(OUTPUT) added to PROC MEANS in 9.3 makes this a much simpler task.
proc means data=have n nmiss stackods;
ods output summary=want;
run;
| Variable | N | NMiss |
| ------ | ----- | ----- |
| a | 4 | 3 |
| b | 7 | 0 |
| c | 6 | 1 |

Related

SaS 9.4: How to use different weights on the same variable without datastep or proc sql

I can't find a way to summarize the same variable using different weights.
I try to explain it with an example (of 3 records):
data pippo;
a=10;
wgt1=0.5;
wgt2=1;
wgt3=0;
output;
a=3;
wgt1=0;
wgt2=0;
wgt3=1;
output;
a=8.9;
wgt1=1.2;
wgt2=0.3;
wgt3=0.1;
output;
run;
I tried the following:
proc summary data=pippo missing nway;
var a /weight=wgt1;
var a /weight=wgt2;
var a /weight=wgt3;
output out=pluto (drop=_freq_ _type_) sum()=;
run;
Obviously it gives me a warning because I used the same variable "a" (I can't rename it!).
I've to save a huge amount of data and not so much physical space and I should construct like 120 field (a0-a6,b0-b6 etc) that are the same variables just with fixed weight (wgt0-wgt5).
I want to store a dataset with 20 columns (a,b,c..) and 6 weight (wgt0-wgt5) and, on demand, processing a "summary" without an intermediate datastep that oblige me to create 120 fields.
Due to the huge amount of data (more or less 55Gb every month) I'd like also not to use proc sql statement:
proc sql;
create table pluto
as select sum(db.a * wgt1) as a0, sum(db.a * wgt1) as a1 , etc.
quit;
There is a "Super proc summary" that can summarize the same field with different weights?
Thanks in advance,
Paolo
I think there are a few options. One is the data step view that data_null_ mentions. Another is just running the proc summary however many times you have weights, and either using ods output with the persist=proc or 20 output datasets and then setting them together.
A third option, though, is to roll your own summarization. This is advantageous in that it only sees the data once - so it's faster. It's disadvantageous in that there's a bit of work involved and it's more complicated.
Here's an example of doing this with sashelp.baseball. In your actual case you'll want to use code to generate the array reference for the variables, and possibly for the weights, if they're not easily creatable using a variable list or similar. This assumes you have no CLASS variable, but it's easy to add that into the key if you do have a single (set of) class variable(s) that you want NWAY combinations of only.
data test;
set sashelp.baseball;
array w[5];
do _i = 1 to dim(w);
w[_i] = rand('Uniform')*100+50;
end;
output;
run;
data want;
set test end=eof;
i = .;
length varname $32;
sumval = 0 ;
sum=0;
if _n_ eq 1 then do;
declare hash h_summary(suminc:'sumval',keysum:'sum',ordered:'a');;
h_summary.defineKey('i','varname'); *also would use any CLASS variable in the key;
h_summary.defineData('i','varname'); *also would include any CLASS variable in the key;
h_summary.defineDone();
end;
array w[5]; *if weights are not named in easy fashion like this generate this with code;
array vars[*] nHits nHome nRuns; *generate this with code for the real dataset;
do i = 1 to dim(w);
do j = 1 to dim(vars);
varname = vname(vars[j]);
sumval = vars[j]*w[i];
rc = h_summary.ref();
if i=1 then put varname= sumval= vars[j]= w[i]=;
end;
end;
if eof then do;
rc = h_summary.output(dataset:'summary_output');
end;
run;
One other thing to mention though... if you're doing this because you're doing something like jackknife variance estimation or that sort of thing, or anything that uses replicate weights, consider using PROC SURVEYMEANS which can handle replicate weights for you.
You can SCORE your data set using a customized SCORE data set that you can generate
with a data step.
options center=0;
data pippo;
retain a 10 b 1.75 c 5 d 3 e 32;
run;
data score;
if 0 then set pippo;
array v[*] _numeric_;
retain _TYPE_ 'SCORE';
length _name_ $32;
array wt[3] _temporary_ (.5 1 .333);
do i = 1 to dim(v);
call missing(of v[*]);
do j = 1 to dim(wt);
_name_ = catx('_',vname(v[i]),'WGT',j);
v[i] = wt[j];
output;
end;
end;
drop i j;
run;
proc print;[enter image description here][1]
run;
proc score data=pippo score=score;
id a--e;
var a--e;
run;
proc print;
run;
proc means stackods sum;
ods exclude summary;
ods output summary=summary;
run;
proc print;
run;
enter image description here

Highlight the corresponding line number with another datasets

I have two datasets, one extract the extreme values from proc univariate. I would like to create a new variable and label them as 1 if the n in the original dataset equals the extracted line number in the univariate dataset. But I don't know how to program it not manually enter the line number.
 
There're a few ways to do this, but one easy way is to just add the rownum to the original dataset and merge on it.
Here's an example.
ods output extremeobs=extreme_test;
proc univariate data=sashelp.heart;
run;
ods output close;
data extreme_diastolic extreme_systolic; *just creating the extreme datasets;
set extreme_test;
if varname='Diastolic' then output extreme_diastolic;
else if varname='Systolic' then output extreme_systolic;
run;
data for_merge; *adding rownum on to the original dataset;
set sashelp.heart;
rownum = _n_;
run;
*now, sort the extreme datasets by the `highobs` and `lowobs` values respectively and save those as `rownum`, so they can be merged;
proc sort data=extreme_diastolic out=high_diastolic(keep=highobs rename=highobs=rownum);
by highobs;
run;
proc sort data=extreme_systolic out=high_systolic(keep=highobs rename=highobs=rownum);
by highobs;
run;
proc sort data=extreme_diastolic out=low_diastolic(keep=lowobs rename=lowobs=rownum);
by lowobs;
run;
proc sort data=extreme_systolic out=low_systolic(keep=lowobs rename=lowobs=rownum);
by lowobs;
run;
*now, merge those on using `in=` to identify which are matches.;
data heart_extremes;
merge for_merge high_diastolic(in=_highd) high_systolic(in=_highs) low_diastolic(in=_lowd) low_systolic(in=_lows);
by rownum;
if _highd then high_diastolic = 1;
if _highs then high_systolic = 1;
if _lowd then low_diastolic = 1;
if _lows then low_systolic = 1;
run;

Null Columns only based on a variable value - SAS Dataset

I have a very big SAS Dataset with over 280 variables in it and I need retrieve all the complete NULL columns based on a Variable value. For example I have a Variable called Reported(with only values Yes & No) in this dataset and I want to find out based on value No, all the complete Null Columns in this dataset.
Is there any quick way to find this out with out writing all the columns names for complete NULL values?
So for example if I have 4 Variables in the table,
So based on the above table I would like to see the output like this where Var4='No' and only return the columns with all the missing values
This would help me to identify variables which are not being populated at all where the Var4 value is 'No'
Note the WHERE statement in PROC FREQ.
proc format;
value $_xmiss_(default=1 min=1 max=1) ' ' =' ' other='1';
value _xmiss_(default=1 min=1 max=1) ._-.Z=' ' other='1';
quit;
%let data=sashelp.heart;
proc freq data=&data nlevels;
where status eq: 'A';
ods select nlevels;
ods output nlevels=nlevels;
format _character_ $_xmiss_. _numeric_ _xmiss_.;
run;
data nlevels;
length TABLEVAR $32 TABLEVARLABEL $128 NLEVELS NMISSLEVELS NNONMISSLEVELS 8;
retain NLEVELS NMISSLEVELS NNONMISSLEVELS 0;
set nlevels;
run;
There are 2 parts to the question I think. First is to subset records where Reported = "N". Then among those records, report columns that have all missing values. If this is correct then you could do something as follows (I am assuming that the columns with missing values are all numeric. If not, this approach will need a slight modification):
/* Thanks to REEZA for pointing out this way of getting the freqs. This eliminates some constraints and is more efficient */
proc freq data=have nlevels ;
where var1 = "N" ;
ods output nlevels = freqs;
table _all_;
run;
proc sql noprint;
select TableVar into :cols separated by " " from freqs where NNonMissLevels = 0 ;
quit;
%put &cols;
data want;
set have (keep = &cols var1);
where var1 = "N" ;
run;

SAS: Replace rare levels in variable with new level "Other"

I've got pretty big table where I want to replace rare values (for this example that have less than 10 occurancies but real case is more complicated- it might have 1000 levels while I want to have only 15). This list of possible levels might change so I don't want to hardcode anything.
My code is like:
%let var = Make;
proc sql;
create table stage1_ as
select &var.,
count(*) as count
from sashelp.cars
group by &var.
having count >= 10
order by count desc
;
quit;
/* Join table with table including only top obs to replace rare
values with "other" category */
proc sql;
create table stage2_ as
select t1.*,
case when t2.&var. is missing then "Other_&var." else t1.&var. end as &var._new
from sashelp.cars t1 left join
stage1_ t2 on t1.&var. = t2.&var.
;
quit;
/* Drop old variable and rename the new as old */
data result;
set stage2_(drop= &var.);
rename &var._new=&var.;
run;
It works, but unfortunately it is not very officient as it needs to make a join for each variable (in real case I am doing it in loop).
Is there a better way to do it? Maybe some smart replace function?
Thanks!!
You probably don't want to change the actual data values. Instead consider creating a custom format for each variable that will map the rare values to an 'Other' category.
The FREQ procedure ODS can be used to capture the counts and percentages of every variable listed into a single table. NOTE: Freq table/out= captures only the last listed variable. Those counts can be used to construct the format according to the 'othering' rules you want to implement.
data have;
do row = 1 to 1000;
array x x1-x10;
do over x;
if row < 600
then x = ceil(100*ranuni(123));
else x = ceil(150*ranuni(123));
end;
output;
end;
run;
ods output onewayfreqs=counts;
proc freq data=have ;
table x1-x10;
run;
data count_stack;
length name $32;
set counts;
array x x1-x10;
do over x;
name = vname(x);
value = x;
if value then output;
end;
keep name value frequency;
run;
proc sort data=count_stack;
by name descending frequency ;
run;
data cntlin;
do _n_ = 1 by 1 until (last.name);
set count_stack;
by name;
length fmtname $32;
fmtname = trim(name)||'top';
start = value;
label = cats(value);
if _n_ < 11 then output;
end;
hlo = 'O';
label = 'Other';
output;
run;
proc format cntlin=cntlin;
run;
ods html;
proc freq data=have;
table x1-x10;
format
x1 x1top.
x2 x2top.
x3 x3top.
x4 x4top.
x5 x5top.
x6 x6top.
x7 x7top.
x8 x8top.
x9 x9top.
x10 x10top.
;
run;

How to write a concise list of variables in table of a freq when the variables are differentiated only by a suffix?

I have a dataset with some variables named sx for x = 1 to n.
Is it possible to write a freq which gives the same result as:
proc freq data=prova;
table s1 * s2 * s3 * ... * sn /list missing;
run;
but without listing all the names of the variables?
I would like an output like this:
S1 S2 S3 S4 Frequency
A 10
A E 100
A E J F 300
B 10
B E 100
B E J F 300
but with an istruction like this (which, of course, is invented):
proc freq data=prova;
table s1:sn /list missing;
run;
Why not just use PROC SUMMARY instead?
Here is an example using two variables from SASHELP.CARS.
So this is PROC FREQ code.
proc freq data=sashelp.cars;
where make in: ('A','B');
tables make*type / list;
run;
Here is way to get counts using PROC SUMMARY
proc summary missing nway data=sashelp.cars ;
where make in: ('A','B');
class make type ;
output out=want;
run;
proc print data=want ;
run;
If you need to calculate the percentages you can instead use the WAYS statement to get both the overall and the individual cell counts. And then add a data step to calculate the percentages.
proc summary missing data=sashelp.cars ;
where make in: ('A','B');
class make type ;
ways 0 2 ;
output out=want;
run;
data want ;
set want ;
retain total;
if _type_=0 then total=_freq_;
percent=100*_freq_/total;
run;
So if you have 10 variables you would use
ways 0 10 ;
class s1-s10 ;
If you just want to build up the string "S1*S2*..." then you could use a DO loop or a macro %DO loop and put the result into a macro variable.
data _null_;
length namelist $200;
do i=1 to 10;
namelist=catx('*',namelist,cats('S',i));
end;
call symputx('namelist',namelist);
run;
But here is an easy way to make such a macro variable from ANY variable list not just those with numeric suffixes.
First get the variables names into a dataset. PROC TRANSPOSE is a good way if you use the OBS=0 dataset option so that you only get the _NAME_ column.
proc transpose data=have(obs=0) ;
var s1-s10 ;
run;
Then use PROC SQL to stuff the names into a macro variable.
proc sql noprint;
select _name_
into :namelist separated by '*'
from &syslast
;
quit;
Then you can use the macro variable in your TABLES statement.
proc freq data=have ;
tables &namelist / list missing ;
run;
Car':
In short, no. There is no shortcut syntax for specifying a variable list that crosses dimension.
In long, yes -- if you create a surrogate variable that is an equivalent crossing.
Discussion
Sample data generator:
%macro have(top=5);
%local index;
data have;
%do index = 1 %to &top;
do s&index = 1 to 2+ceil(3*ranuni(123));
%end;
array V s:;
do _n_ = 1 to 5*ranuni(123);
x = ceil(100*ranuni(123));
if ranuni(123) < 0.1 then do;
ix = ceil(&top*ranuni(123));
h = V(ix);
V(ix) = .;
output;
V(ix) = h;
end;
else
output;
end;
%do index = 1 %to &top;
end;
%end;
run;
%mend;
%have;
As you probably noticed table s: created one freq per s* variable.
For example:
title "One table per variable";
proc freq data=have;
tables s: / list missing ;
run;
There is no shortcut syntax for specifying a variable list that crosses dimension.
NOTE: If you specify out=, the column names in the output data set will be the last variable in the level. So for above, the out= table will have a column "s5", but contain counts corresponding to combinations for each s1 through s5.
At each dimensional level you can use a variable list, as in level1 * (sublev:) * leaf. The same caveat for out= data applies.
Now, reconsider the original request discretely (no-shortcut) crossing all the s* variables:
title "1 table - 5 columns of crossings";
proc freq data=have;
tables s1*s2*s3*s4*s5 / list missing out=outEach;
run;
And, compare to what happens when a data step view uses a variable list to compute a surrogate value corresponding to the discrete combinations reported above.
data haveV / view=haveV;
set have;
crossing = catx(' * ', of s:); * concatenation of all the s variables;
keep crossing;
run;
title "1 table - 1 column of concatenated crossings";
proc freq data=haveV;
tables crossing / list missing out=outCat;
run;
Reality check with COMPARE, I don't trust eyeballs. If zero rows with differences (per noequal) then the out= data sets have identical counts.
proc compare noprint base=outEach compare=outCat out=diffs outnoequal;
var count;
run;
----- Log -----
NOTE: There were 31 observations read from the data set WORK.OUTEACH.
NOTE: There were 31 observations read from the data set WORK.OUTCAT.
NOTE: The data set WORK.DIFFS has 0 observations and 3 variables.
NOTE: PROCEDURE COMPARE used (Total process time)