How do I put conditions around Proc Freq statements in SAS? - sas

I have the following statement
Proc Freq data =test;
tables gender;
run;
I want this to generate an output based on a condition applied to the gender variable. For example - if count of gender greater than 2 then output.
How can I do this in SAS?
Thanks

If you mean an output dataset, you can put a where clause directly in the output dataset options.
Proc Freq data =sashelp.class;
tables sex/out=sex_freq(where=(count>9));
run;

I'm not aware of how you can accomplish this only using proc freq but you can redirect the output to a data set and then print the results.
proc freq data=test;
tables gender / noprint out=tmp;
run;
proc print data=tmp;
where count > 2;
run;
Alternatively you could use proc summary, but this still requires two steps.
proc summary data=test nway;
class gender;
output out=tmp(where=(_freq_ > 2));
run;
proc print data=tmp;
run;

Related

PROC MEANS output as table

I'm trying to export quartile information on a grouped dataset as a dataset in SAS but when I run this code my output is a table with the correct information displayed but the dataset WORK.TOP_1O_PERC is only summary statistics of the set (no quartiles). Does anyone know how I can export this as the CLASS (PDX) and its 25th and 75th percentiles? Thanks!
PROC MEANS DATA=WORK.TOP_10_DX P25 P75;
CLASS PDX;
VAR AmtPaid;
OUTPUT OUT = WORK.TOP_10_PERC;
RUN;
I like the STACKODS output that is a data set which is like the default printed output.
proc means data=sashelp.class n p25 p75 stackods;
ods output summary=summary;
run;
proc print;
run;
You can use output statement with <statistics>= options.
PROC MEANS DATA=WORK.TOP_10_DX NOPRINT;
CLASS PDX;
VAR AmtPaid;
OUTPUT OUT = WORK.TOP_10_PERC P25=P25 P75=P75;
RUN;
Compared to ods output, output statement is much faster but less flexible with multiple analysis variables or by statement specified situation.

Out of Memory using PROC FREQ

I have approximately 1,000,000 rows and 25 columns of data and I'm trying to return a list of column names, the number of distinct values and whether there are missing values.
I am not able to directly code in column names in PROC SQL and count distinct as I have numerous data sets with different column names and I'm trying to automatically return the desired outcome for all tables with one piece of code.
I've tried running the following code
proc freq nlevels data= &DATASET_NAME;
ods output nlevels=nlevels ;
tables _all_ NOPRINT;
run;
This returns an out of memory error. Is there another way to achieve the result, avoiding the out of memory error.
It is unnecessary to input column name by table _all_, but it possibly makes out of memory by inputting all columns at the same time, try to separate column to do proc freq and then combine results:
proc sql;
create table name as
select name from dictionary.columns where libname='SASHELP' and memname='CLASS';
quit;
data want;
run;
data _null_;
set name;
call execute(
'proc freq data=class nlevels;
table '||name||';
ods output nlevels=nlevels;
run;
data want;
set want nlevels;
run;'
);
run;
This question is very similar to SAS summary statistic from a dataset
The answers cover techniques for
transpose + freq
hash
freq w/ ODS exclude+output

Average number of rows per variable in SAS

I have the following dataset :
data test;
input business_ID $;
datalines;
'busi1'
'busi1'
'busi1'
'busi2'
'busi3'
'busi3'
;
run;
proc freq data = test ;
table business_ID;
run;
I would like the average nummber of lines per business, that is count the total number of observations and divide it by the number of distinct businesses.
In my example : 6 observations, 3 businesses -> 6/2=3 lines per business.
I was thinking about using a proc freq or a proc mean step but so far I got only the number of lines (~freq) per business and do not know how to get to my goal.
Any idea?
You could use PROC FREQ to get the counts and then run PROC MEANS on the output.
proc freq data=test ;
tables business_id / noprint out=counts ;
run;
proc means data=counts;
var count;
run;
Or you could count them directly with PROC SQL code.
proc sql ;
select count(*)/count(distinct business_id) as mean_count
from test
;
quit;

how to print dataset without formats in SAS

I ´have a dataset with formats attached to it and I dont want to remove the formats from the dataset and when I use proc freq or proc print, I want the original values and not the formats attached.
Proc print data=mylib.data;
run;
is there any format=no option?
proc freq data=mylib.data;
tables gender;
format?????
run;
You can remove a format by specifying a null format on the PROC strep:
proc freq data=mylib.data ;
tables gender ;
format _ALL_ ;
run ;
_ALL_ is a list of all variables in the dataset.

Proc Freq and unique id values

I have a table with two columns, ID and Gender as below
I am trying to count the number of males and females. I wrote a code like this
Proc Freq data=Work.Test1;
tables gender;
run;
The output i got was 5males and 2 females, I know this is wrong because Id repeats many times , there are only 2 males and 1 female. My question is how do i change Proc Freq so that I get the count for gender (males and Females) for unique Id values ?
You can use Nlevels in proc freq
Proc freq data= yourdata NLEVELS;
tables gender /noprint;
run;
I'm not sure if this is easy to do without using SQL or data step to work it out.
proc sql;
create table want as
select gender, count(distinct id) as count
from have
group by gender;
quit;
or (sorted by gender id)
data want;
set have;
by gender id;
if first.gender then count=0;
if first.id then count+1;
if last.gender then output;
run;
PROC TABULATE might be able to do what you want, but I couldn't figure a quick method out.
Try this:
proc sort data=have out=want nodupkey;
by id gender;
proc freq data=want;
tables gender;
run;
This will give you one record per ID/gender, then you can run your freq for gender.