creating output from proc freq in SAS - sas

I am running the following SAS code in SAS Enterprise Guide 6.1 to get some summary stats on null/not null for all the variables in a table. This is producing the desired info via the 'results' tab, which creates a separate table for each result showing null/not null frequencies and percentages.
What I'd like to do is put the results into an output dataset with all the variables and stats in a single table.
proc format;
value $missfmt ' '='Missing' other='Not Missing';
value missfmt . ='Missing' other='Not Missing';
run;
proc freq data=mydatatable;
format _CHAR_ $missfmt.;
tables _CHAR_ / out=work.out1 missing missprint nocum;
format _NUMERIC_ missfmt.;
tables _NUMERIC_ / out=work.out2 missing missprint nocum;
run;
out1 and out2 are being generated into tables like this:
FieldName | Count | Percent
Not Missing | Not Missing | Not Missing
But are only populated with one variable each, and the frequency counts are not being shown.
The table I'm trying to create as output would be:
field | Missing | Not Missing | % Missing
FieldName1 | 100 | 100 | 50
FieldName2 | 3 | 97 | 3

The tables statement output options only apply to the last table requested. _CHAR_ resolves to (all character variables), but they're single tables, so you only get the last one requested.
You can get this one of two ways. Either use PROC TABULATE, which more readily deals with lists of variables; or use ODS OUTPUT to grab the proc freq output. Both output styles will take some work likely to get into exactly the structure you want.
ods output onewayfreqs=myfreqs; *use `ODS TRACE` to find this name if you do not know it;
proc freq data=sashelp.class;
tables _character_;
tables _numeric_;
run;
ods output close;

Related

SAS - PROC SQL: How to show predicted values in a table using PROC REG?

I have run a regression on a data set. I would like to then add the predicted values into the original data set table. I would like the PredictedMS_Diff values to be added to the PROPreg_CSR_final dataset.
proc reg data=PROPreg_CSR_final outest=outest_model_1 covout plots=diagnostics(stats=(default aic
sbc));
title "CSR Final";
FinalCSR:MODEL MS_Diff_CSR=Rank_Delta_prop;
Output PREDICTED=PredictedMS_Diff
run;
title;
Your output statement does not have an OUT= option so the data set is named by SAS. Also missing a semicolon.
Output PREDICTED=PredictedMS_Diff
If that has worked it would have been a copy of the input data with PredictedMS_Diff added.
proc reg data=sashelp.class;
model weight=height;
output out=pred predicted=p residual=r;
run;

Append of Tables with the Same Variables but Differing Attributes

My question is about the append of two different tables that are supposed to have the same name/format/type/length variables.
I am trying to create a step in my SAS program where I don't allow my program to be executed if the format/type/length of variables with the same name is not the same.
For example, when in one table I have a date in type string "dd-mm-yyyy" and in the other table I have the "yyyy-mm-dd" or "dd-mm-yyyy hh:mm:ss". After the append, our daily executions based on these input tables didn't work as expected. Sometimes the values come up as missing or out of order, since the formats are different.
I tried using the PROC COMPARE statement, which allowed me to check which variables have Differing Attributes (Type, Length, Format, InFormat and Labels).
proc compare base = SAS-data-set
compare = SAS-data-set;
run;
However, I only got the info on which variables have differing atributes (listing of common variables with differing attributes), not being able to do anything with/about it.
On the other hand, I would like to know if there's a chance to have a structured output table with this information, in order to use it as a control statement.
Creating an automatic task to do it would save me a lot of time.
Screenshot of an example:
You can use Proc CONTENTS to get information about a data sets variables. Do that for both data sets, and then you can use Proc COMPARE to create a data set informing you of the variable attributes differences.
data cars1;
set sashelp.cars (obs=10);
date = today ();
format date date9.;
cars1_only = 1;
x = 1.458; label x = "x-factor";
run;
data cars2;
length type $50;
set sashelp.cars (obs=10);
format date yymmdd10.;
cars2_only = 1;
X = 1.548; label x = "X factor to apply";
run;
proc contents noprint data=cars1 out=cars1_contents;
proc contents noprint data=cars2 out=cars2_contents;
run;
data cars1_contents;
set cars1_contents;
upName = upcase(Name);
run;
data cars2_contents;
set cars2_contents;
upName = upcase(Name);
run;
proc sort data=cars1_contents; by upName;
proc sort data=cars2_contents; by upName;
run;
proc compare noprint
base=cars1_contents
compare=cars2_contents
outall
out=cars_contents_compare (where=(_TYPE_ ne 'PERCENT'))
;
by upName;
run;
There is also an ODS table you can capture directly without having to run Proc CONTENTS, but the capture is not 'data-rific'
ods output CompareVariables=work.cars_vars;
proc compare base=cars1 compare=cars2;
run;

Out of Memory using PROC FREQ

I have approximately 1,000,000 rows and 25 columns of data and I'm trying to return a list of column names, the number of distinct values and whether there are missing values.
I am not able to directly code in column names in PROC SQL and count distinct as I have numerous data sets with different column names and I'm trying to automatically return the desired outcome for all tables with one piece of code.
I've tried running the following code
proc freq nlevels data= &DATASET_NAME;
ods output nlevels=nlevels ;
tables _all_ NOPRINT;
run;
This returns an out of memory error. Is there another way to achieve the result, avoiding the out of memory error.
It is unnecessary to input column name by table _all_, but it possibly makes out of memory by inputting all columns at the same time, try to separate column to do proc freq and then combine results:
proc sql;
create table name as
select name from dictionary.columns where libname='SASHELP' and memname='CLASS';
quit;
data want;
run;
data _null_;
set name;
call execute(
'proc freq data=class nlevels;
table '||name||';
ods output nlevels=nlevels;
run;
data want;
set want nlevels;
run;'
);
run;
This question is very similar to SAS summary statistic from a dataset
The answers cover techniques for
transpose + freq
hash
freq w/ ODS exclude+output

Output the dropped/excluded observation in Proc GLIMMIX - SAS

When I run a proc glimmix in SAS, sometimes it drops observations.
How do I get the set of dropped/excluded observations or maybe the set of included observations so that I can identify the dropped set?
My current Proc GLIMMX code is as follows-
%LET EST=inputf.aarefestimates;
%LET MODEL_VAR3 = age Male Yearc2010 HOSPST
Hx_CTSURG Cardiogenic_Shock COPD MCANCER DIABETES;
data work.refmodel;
set inputf.readmref;
Yearc2010 = YEAR - 2010;
run;
PROC GLIMMIX DATA = work.refmodel NOCLPRINT MAXLMMUPDATE=100;
CLASS hospid HOSPST(ref="xx");
ODS OUTPUT PARAMETERESTIMATES = &est (KEEP=EFFECT ESTIMATE STDERR);
MODEL RADM30 = &MODEL_VAR3 /Dist=b LINK=LOGIT SOLUTION;
XBETA=_XBETA_;
LINP=_LINP_;
RANDOM INTERCEPT/SUBJECT= hospid SOLUTION;
OUTPUT OUT = inputf.aar
PRED(BLUP ILINK)=PREDPROB PRED(NOBLUP ILINK)=EXPPROB;
ID XBETA LINP hospst hospid Visitlink Key RADM30;
NLOPTIONS TECH=NRRIDG;
run;
Thank you in advance!
It drops records with missing values in any variable you're using in the model, in a CLASS, BY, MODEL, RANDOM statement. So you can check for missing among those variables to see what you get. Usually the output data set will also indicate this by not having predictions for the records that are not used.
You can run the code below.
*create fake data;
data heart;set sashelp.heart; ;run;
*Logistic Regression model, ageCHDdiag is missing ;
proc logistic data=heart;
class sex / param=ref;
model status(event='Dead') = ageCHDdiag height weight diastolic;
*generate output data;
output out=want p=pred;
run;
*explicitly flag records as included;
data included;
set want;
if missing(pred) then include='N'; else include='Y';
run;
*check that Y equals total obs included above;
proc freq data=included;
table include;
run;
The output will show:
The LOGISTIC Procedure
Model Information
Data Set WORK.HEART
Response Variable Status
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring
Number of Observations Read 5209
Number of Observations Used 1446
And then the PROC FREQ will show:
The FREQ Procedure
Cumulative Cumulative
include Frequency Percent Frequency Percent
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
N 3763 72.24 3763 72.24
Y 1446 27.76 5209 100.00
And 1,446 records are included in both of the data sets.
I think I answered my question.
The code line -
OUTPUT OUT = inputf.aar
gives the output of the model. This table includes all the observations used in the proc statement. So I can match the data in this table to my input table and find the observations that get dropped.
#REEZA - I already looked for missing values for all the columns in the data. Was not able to identify the records there are getting dropped by only identifying the no. of records with missing values. Thanks for the suggestion though.

Converting proc freq output into a dataset

I have a sas data set with 564 variables. I need to create a new table with three columns, column1 will be the variables name , column2 will be the value of that variable, and column3 will be observation number.
So if I have a variable gender then gender will be listed 2 times in the variable column, and the gender values will be listed as m in the first row and female in the second row, and the column3 will be just the number of the observation. This is how it should look. Many thanks in advance.
var value obs
gender m 1
gender f 2
ans yes 3
ans no 4
The key to getting a table out of PROC FREQ that has all the values in one column is ods output combined with coalescec.
ODS OUTPUT lets you tell PROC FREQ to put everything into one dataset (as opposed to out= which just puts one Freq table into one dataset). That gives you a slightly messy result, which we then use coalescec to fix. That function takes a list of variables and returns the first nonmissing value from them; since the F_ variables always have just one value populated (the formatted value of the variable in the table), it's easy to use them.
ods output onewayfreqs=freqs;
proc freq data=sashelp.class;
tables age sex;
run;
ods output close; *technically unneeded but makes it more clear;
data want;
set freqs;
value = left(coalescec(of f_:));
run;
The rest of what you note above is trivial from that dataset.